Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for reconfiguring the root storage #94

Open
ajeddeloh opened this issue Dec 12, 2018 · 14 comments

Comments

@ajeddeloh
Copy link
Contributor

commented Dec 12, 2018

FCOS will have a static, server-side generated initramfs. This means the initramfs needs to know how to find the root filesystem to mount it, including if that means starting RAID devices. However, we do NOT want to start other raid devices (unless Ignition is starting them as part of the creation process) since users might be writing out configuration to things like /etc/mdadm.conf that they want to use.

On CL we require that RAIDed devices be partitions with the a special GPT partition type guid and have a udev rule to start only those.

For FCOS I can see two options:

  • Do something similar to what CL did. This has the limitation of being unable to use whole disks for raid and requires partition tables where they would not otherwise be needed.
  • Have some sort of file that gets written out to /boot which is a "rootmap". This would describe the "path to root" including what services like mdadm need to be started. Ignition could generate this if we include a way of flagging partitions/RAIDs as "containing root" or it could be user written. If it is user written we should unmount root after Ignition files/umount and then let whatever tooling reads it mount the root itself to ensure the first boot is not "special" in any way.
@ajeddeloh ajeddeloh added the meeting label May 1, 2019
@dustymabe

This comment has been minimized.

Copy link
Member

commented May 8, 2019

We discussed this briefly in the meeting today - There are two things I'd like (personal opinion) to do as part of this:

  • develop a list of cases we'd like to support
    • this will allow us to prove out theories about solutions here
  • prefer not building something new
    • of course if we need to we need to, but would like to have strong justification for doing so

Andrew also mentioned that he'd like the solution to be generic enough that if we decide to support new use cases in the future our current solution doesn't prohibit us from doing so. If we build something new, this is reasonable.

@ajeddeloh ajeddeloh removed the meeting label May 15, 2019
@cgwalters cgwalters changed the title Root on raid and other complex devices. Support for reconfiguring the root storage Jul 29, 2019
@cgwalters

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

Lots of discussion around partitioning in #18

Let's repurpose this issue as a tracker for support for reconfiguring the root filesystem in any form; RAID is a subset of that.

Bigger picture, this gets a lot more complex with the combination of FCOS:(ostree, ignition) as opposed to ContainerLinux:(dual partition /usr, ignition), because with CL replacing the rootfs is nearly trivial, but for ostree it is entangled with the OS (which adds flexibility, but does make this problem more complex).

There are two sub-paths to this problem: One where we're not destroying the existing rootfs (e.g. we're just adding a new partition that we want to make into the rootfs), and one where we're intending to replace it.

I think if we're trying to replace the rootfs...we get into an interesting problem domain because we need to pull the ostree repo into RAM temporarily. In most cases...hmm, that should be fine really. But, if we want to optimize things, I could imagine that coreos-installer accepts Ignition, and if a replacement of the root filesystem is requested, we do that at install time as opposed to boot time. We'd then pass the remaining Ignition as /boot/ignition.ign.

And we need support for running ostree in the initramfs - shouldn't be too hard.

The flow would be something like:

  • ignition-disks.service
  • ostree-redeploy-rootfs.service
  • ignition-files.service
@ajeddeloh

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2019

Hadn't considered pulling it into RAM; I like that idea. I was thinking about just pulling over network, but RAM has some obvious speed/bandwidth cost advantages. There's some tricky bits:

  • Unless we always want to pull the ostree into ram, we need to detect whether we're going to blow away root.
  • We can't do it if ostree size > available ram, where available ram will be slightly smaller than apparent because we need to hold it in ram while ignition disks runs. We could optionally fall back to using network in this case.
  • Before ignition disks runs, we don't have a rendered config which means we can't know if we're going to blow away root. This could be solved by splitting config rendering into it's own stage
  • ignition-disks starts before all devices are ready. We can't look at device fields in the config since those could be using those symlinks.

I could imagine that coreos-installer accepts Ignition, and if a replacement of the root filesystem is requested, we do that at install time as opposed to boot time

This doesn't help on clouds. I'd rather have a unified way. In some ways it would be easier to just do all of disks then, but that's got it's own challenges and deserves it's own ticket.

Proposal:

  • Split ignition disks into render and disks
  • Detection could work a couple ways:
    • Require that the partition containing root has partlabel=root. Cache to ram if we find any partition entries with "label": "root". We can't (reasonably) wait on udev for the device since if it's on raid or similar it won't become available until after ignition-disks. This generates false positives for cases like resizing root to be bigger.
    • Require that the filesystem containing root has label=root. Cache to ram if we find any filesystem entries with "label": "root". I can't think of a reason to include a rootfs definition matching the default anyway.
  • ignition-[u]mount needs to learn to ignore filesystems with "path": "/" since the initramfs owns mounting those
  • Still do not support things like split /usr or /etc. This means ostree-redeploy-rootfs needs to happen before ignition mount.

Sequence would look like:

  1. ignition-render
  2. detect-root-change (needs to understand ignition spec but probably shouldn't be part of the ignition project)
  3. ignition disks
  4. rootfs mounted
  5. ostree-redeploy-rootfs
  6. ignition mount
  7. ignition files
  8. ignition umount
  9. switch root

This still leaves open the question of "how do we find/mount root on subsequent boots".

An unrelated question is: do we support blowing away /boot? My gut says no. On x86_64 you can blow away /efi when running on bios (may need to mask the mount unit for it) or BIOS-BOOT when running on EFI. I don't think we can get rid of boot though. It's also partition 1 so it shouldn't get in the way of anything and it's not huge. We probably could find a way to allow moving it but I really don't think it's worth the complexity that would entail.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

We can't do it if ostree size > available ram, where available ram will be slightly smaller than apparent because we need to hold it in ram while ignition disks runs

Yeah I had the same hesitation but then I realized two things. First, the current FCOS is 1.5G. I think if you're doing something like RAID we can presume you have a "serious" system and have at least say 8G of RAM.

Second, we could enable zram temporarily.

@ajeddeloh

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2019

I realized I initially missed the bit about "what if we're just moving the rootfs and the old one will still be accessible". Do you think it's worth the complexity to implement that? If so I think we ought to do this in two phases, handling the generic case first then treating that as an optimization.

@ajeddeloh

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2019

Second, we could enable zram temporarily.

Neat! I think we ought to handle that second and only enable it when we don't have enough ram to do it without.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

We can't (reasonably) wait on udev for the device since if it's on raid or similar it won't become available until after ignition-disks. This generates false positives for cases like resizing root to be bigger.

I'm confused...doesn't this stuff imply Ignition running twice? We know the initial rootfs we're booting from isn't on RAID because our default disk image doesn't do that.

@ajeddeloh

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2019

I'm confused...doesn't this stuff imply Ignition running twice? We know the initial rootfs we're booting from isn't on RAID because our default disk image doesn't do that.

I'm not sure why it would imply running Ignition twice? Unless you're referring to getting the config, in which case that's why I'm suggesting we split that into its own stage. I'm saying it's nontrivial to know with 100% certainty that the root is being destroyed since the block device has aliases that we would need to wait for. I suppose we could, but that makes the live pxe case harder since there isn't a device and some aliases aren't known before boot.

Consider if FCOS is installed with /dev/sda4 being root. That can be referenced by /dev/sda4, /dev/disk/by-label/root and /dev/disk/by-partlabel/root (and some others that are system dependent). I'm saying we can't detect if a user is blowing away the partition because:

  1. We don't know what disk we're installed on (maybe we should add a udev rule for that? create something like /dev/disk/installed. Seems useful in general and could improve config portability), so if they're wiping the filesystem with device: sda4 we don't know if that is actually the rootfs or not since we might be installed on sdb instead.
  2. Those symlinks may not exist yet so if the user is using them we don't know where they point. Things like /dev/disk/by-path/* are going to be dependent on the hardware so we really don't know if they're pointing to the rootfs. We can't wait for them either because while /dev/disk/by-label/root pointing to the rootfs is obvious, something like /dev/disk/by-path/<something> is not. We don't have a way to differentiate those from other symlinks that could appear as a result of ignition-disks. We don't know which ones are safe to wait on and which are not.

Because of that we should be detecting that we're creating a new rootfs instead of deleting the old one. We can either do that with partitions looking for a partlabel (which can have false positives when just resizing) or with the filesystem label (which can have false positives if they create an entry that matches the on-disk FS (see https://github.com/coreos/ignition/blob/master/doc/operator-notes.md#filesystem-reuse-semantics), though I'm not sure why you'd do that).

@ajeddeloh ajeddeloh added meeting and removed meeting labels Jul 30, 2019
@ajeddeloh

This comment has been minimized.

Copy link
Contributor Author

commented Jul 31, 2019

We discussed this at the FCOS IRC meeting today, mostly about how to handle the second boot. There's essentially five options:

  1. Do what Container Linux does and require partitions containing complex devices (e.g luks volumes, parts of RAID, etc) use GPT type guids to specify what they are. The initramfs can then carry udev rules to start these devices
    • Devices MUST be GPT partitioned to be used (can't do RAID on the device itself, only partitions)
    • Can't carry extra configuration for things like mdadm, encryption
  2. Do what mutable OS's do and regenerate the initramfs to include configuration to start complex devices
    • Dracut's default modules can be sloppy (do things like start extra devices we don't need to, see rd.auto)
    • Generating new initramfs's means creating new bootloader entries
    • Doesn't require carrying a lot of new code in the initramfs
    • Open question: would we generate this in the initramfs or in the real root?
    • Makes debugging harder since we no longer know exactly what's in the initrd
  3. Write a config file to /boot that maps out how to find the root device
    • Requires a new tool to read that config and mount/start/decrypt things (systemd generator would be nice but we'd need to mount /boot which makes it harder and would slow down boot if unused).
    • Config file would be easy to extend, can handle arbitrary complexity
    • Isn't portable to distros that don't have a separate boot (though complex root is harder in general in that case)
  4. Write kernel command line arguments that specify how to find the root device.
    • Requires a new tool to read that config and mount/start/decrypt things. Could be a systemd generator
    • kargs aren't great if the user is doing something complex (e.g. LUKS on RAID on RAID).
    • kcmdline max length is 4k
    • kargs are transparent, matches with "traditional" ways of specifying root devices.
    • kargs are simple to read/implement
  5. Do 3, but use an extra initrd instead of writing a file to /boot
    • All the same points as 3, except:
    • can write a systemd generator
    • creating an initrd for that is harder especially if the user writes it by hand (see below)
    • would need a default "noop" initrd that can be replaced so grub doesn't throw a fit when it can't find it. Not sure if ostree changes would be needed for that.
    • initrd can just be a file ignition writes to /boot

In cases 2-5 there's also the question of "how do we specify what is needed to start the root disk?" Case 1 it's implied that you specify it with type guids. We could:

  • Try to infer from the ignition config.
    • would require users use well known symlinks to devices (e.g. /dev/md/name instead of /dev/mdXXX)
    • would be hard to debug if we inferred wrongly
    • not very explicit what's going on
  • Carry extra fields in FCCL for "contains_root" (not married to that name) which FCCT (case 3, 4, 5) or dracut (case 2) uses to write out the config/kargs or create the new initramfs.
    • more explicit, but not too much extra work for the end user
    • Doesn't leave much room for things like specifying options for encryption, mdadm, etc
  • Require the user manually create the config (3), kargs (4), or config initrd (5) and use Ignition to write them.
    • Nice since it's explicit. We're not doing anything the use doesn't expect
    • Bad since a lot of what they write would be largely redundant.
@ajeddeloh

This comment has been minimized.

Copy link
Contributor Author

commented Aug 20, 2019

Bit of data: looks like the kcmdline size is 2048 on all the arches we care about except s390x where it is 896.

@lucab lucab added the kind/design label Aug 27, 2019
@cmurf

This comment has been minimized.

Copy link

commented Sep 11, 2019

Carrying over from #18 since it's closed.

Re: dd and GPT: 512n and 512e can share one GPT, 4Kn needs a different GPT. I don't see how they can be merged without invalidating Partition Entry Array CRC32 which is computed on the content between the header and First Useable LBA. And First Useable LBA on 512 byte sector devices must be 34 or higher.

Re: mkfs.xfs uses a 4096 byte sector size on 512e and 4Kn, but uses 512 bytes on 512n.

Those two above combined suggests three raw images, unless something gives.

Also, when dd'd to a different sized drive, the GPT becomes invalid, so that should probably get fixed as a first step.

Each RAID layout should get its own mkfs.xfs for setting stripe width/unit

On 4Kn drives, minimum EFI system partition size is 256MiB if it's FAT32, which the UEFI spec does suggest it should be, sorta. But mkdosfs only starts using FAT32 above something near 550MiB, unless -F 32 is used. fedora-coreos-30.20190905.0-metal.raw has a FAT16 ESP. Obviously it works, and I haven't ever heard of FAT16 not working on anything.

Re: ZRAM dynamically allocates RAM, setting up swap on ZRAM at 1:1 to RAM has inconsequential overhead on a system that doesn't need it, but you could do a test similar to Anaconda's implementation and only start it for low memory devices. I'd like to see this upstream version working and rebase the myriad implementations on it.
https://github.com/systemd/zram-generator

@cgwalters cgwalters self-assigned this Sep 25, 2019
@cgwalters

This comment has been minimized.

Copy link
Member

commented Sep 25, 2019

I'm working on the fundamentals of this in coreos/ignition-dracut#107

First MVP is to support this fcct:

# Example of redeploying the rootfs as ext4
variant: fcos
version: 1.0.0
storage:
  filesystems:
    - device: /dev/disk/by-partlabel/root
      format: ext4
      wipe_filesystem: true
      label: root

Tracker for issues:

  • Get jlebon's SELinux/initramfs kernel patch into Fedora kernel
  • coreos/coreos-assembler#781
  • Preserve SELinux labels (probably involves writing the code in rpm-ostree)
  • Documentation of what's supported and what isn't
@cgwalters

This comment has been minimized.

Copy link
Member

commented Sep 25, 2019

To reiterate on some discussion in that PR, this is to help enable openshift/enhancements#15 without requiring the root filesystem to come in a LUKS container by default (and relatedly, ensure that when a user chooses LUKS it can be encrypted from the start with a TPM2-sealed key, etc.)

@darkmuggle darkmuggle closed this Sep 25, 2019
@darkmuggle darkmuggle reopened this Sep 25, 2019
@darkmuggle

This comment has been minimized.

Copy link

commented Sep 25, 2019

Sorry, I didn't mean to close. I have NO idea why "close and comment" is next to the comment button.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Sep 25, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Sep 25, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
cgwalters added a commit to cgwalters/coreos-assembler that referenced this issue Sep 27, 2019
This is prep for reprovisioning the rootfs:
coreos/fedora-coreos-tracker#94

Requires: coreos/fedora-coreos-config#187
(Do not merge until RHCOS has also rebased to FCOS with that change)

Closes: coreos#781
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Sep 27, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Sep 27, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
jlebon added a commit to cgwalters/rpm-ostree that referenced this issue Sep 30, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
rh-atomic-bot added a commit to coreos/rpm-ostree that referenced this issue Sep 30, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.

Closes: #1911
Approved by: jlebon
pcmoore added a commit to SELinuxProject/selinux-kernel that referenced this issue Oct 1, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
jlebon added a commit to cgwalters/rpm-ostree that referenced this issue Oct 3, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 3, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 3, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Oct 4, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
openshift-merge-robot added a commit to coreos/rpm-ostree that referenced this issue Oct 4, 2019
All this does is put the immutable bit on the target directory.
The intention is to replace this bit to start:
https://github.com/coreos/coreos-assembler/blob/8b205bfbb971707382ace76bbb39e46ed3fc560d/src/create_disk.sh#L229

However, the real goal here is to add code in this file
to handle redeploying the rootfs for Fedora CoreOS which
combines OSTree+Ignition:
coreos/fedora-coreos-tracker#94

Basically doing this in proper Rust is going to be a lot
nicer than shell script in dracut modules.  Among other
details, coreutils `mv` doesn't seem to do the right thing
for SELinux labels when policy isn't loaded.
StollD pushed a commit to StollD/linux-fedora that referenced this issue Oct 12, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
StollD pushed a commit to StollD/linux-fedora that referenced this issue Oct 12, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 12, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 12, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 13, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 13, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 13, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 13, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 14, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 14, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 14, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 15, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 15, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 18, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 18, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 18, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
qboid-bot added a commit to StollD/linux-fedora that referenced this issue Oct 21, 2019
Currently, the SELinux LSM prevents one from setting the
`security.selinux` xattr on an inode without a policy first being
loaded. However, this restriction is problematic: it makes it impossible
to have newly created files with the correct label before actually
loading the policy.

This is relevant in distributions like Fedora, where the policy is
loaded by systemd shortly after pivoting out of the initrd. In such
instances, all files created prior to pivoting will be unlabeled. One
then has to relabel them after pivoting, an operation which inherently
races with other processes trying to access those same files.

Going further, there are use cases for creating the entire root
filesystem on first boot from the initrd (e.g. Container Linux supports
this today[1], and we'd like to support it in Fedora CoreOS as well[2]).
One can imagine doing this in two ways: at the block device level (e.g.
laying down a disk image), or at the filesystem level. In the former,
labeling can simply be part of the image. But even in the latter
scenario, one still really wants to be able to set the right labels when
populating the new filesystem.

This patch enables this by changing behaviour in the following two ways:
1. allow `setxattr` if we're not initialized
2. don't try to set the in-core inode SID if we're not initialized;
   instead leave it as `LABEL_INVALID` so that revalidation may be
   attempted at a later time

Note the first hunk of this patch is mostly the same as a previously
discussed one[3], though it was part of a larger series which wasn't
accepted.

Co-developed-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Victor Kamensky <kamensky@cisco.com>
Signed-off-by: Jonathan Lebon <jlebon@redhat.com>

[1] https://coreos.com/os/docs/latest/root-filesystem-placement.html
[2] coreos/fedora-coreos-tracker#94
[3] https://www.spinics.net/lists/linux-initramfs/msg04593.html
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.