Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting Anaconda to install FCOS/RHCOS in the future #1646

Open
jlebon opened this issue Jan 10, 2024 · 11 comments
Open

Consider supporting Anaconda to install FCOS/RHCOS in the future #1646

jlebon opened this issue Jan 10, 2024 · 11 comments

Comments

@jlebon
Copy link
Member

jlebon commented Jan 10, 2024

Anaconda nowadays supports installing bootc-compatible container images with the ostreecontainer keyword. In the future, it may directly use bootc instead to carry out the installation (see rhinstaller/anaconda#5197).

Currently, there is work underway to have osbuild-based tooling to generate Anaconda ISOs with the bootable container embedded which will carry out the installation: osbuild/bootc-image-builder#58.

On the CoreOS side, we notably decided very early on to provide a disk image-based flow in the bare metal case so that it closely resembles the cloud case.

With other bootable container variants likely eventually supporting an Anaconda flow, we should consider whether this is something we also want to support in CoreOS.

@jlebon
Copy link
Member Author

jlebon commented Jan 10, 2024

To start the conversation, we should probably list some of the advantages and disadvantages of this. Will add some tomorrow.

@jlebon
Copy link
Member Author

jlebon commented Jan 11, 2024

So I think there are two primary arguments for supporting an Anaconda flow:

  1. Anaconda is how most of RHEL/Fedora is currently installed and kickstart is very familiar to a lot of people.
  2. Anaconda has its own storage stack (blivet) which already knows how to set up complex storage like multipath, iSCSI, RAID, etc... This in theory means that we could drain some of that stuff from the overlays.

And here are three arguments against:

  1. The obvious one is that it breaks the cloud/bare metal symmetry we've upheld so far. That property has had drawbacks. For example:

    • RAID1 support doesn't make a whole lot of sense in clouds, but still had to be awkwardly implemented as part of our first boot logic
    • In an Anaconda world, the XFS agcount issue would've only affected cloud nodes with large boot disks and not bare metal

    But it also provides benefits:

    • Things that do span both cloud and metal use cases only had to be implemented once, for example LUKS encryption and partitioning (e.g. putting /var on a separate volume).
    • A metal disk image means we can pretty much guarantee that the disk will boot. It's built exactly like we build the rest of our cloud images and we can subject it to lots of CI that carries more assurances than CI at the Anaconda level.
    • A metal disk image also allows providing a simple, opinionated, and flexible UX around it to customize the installation flow and installed system. Anaconda OTOH is by design built to be generic and so has lots of knobs, many of which don't apply.
  2. Unless we're planning to stop producing metal disk images, supporting both a disk image-based and an Anaconda-based install will create confusion, especially if each approach has different capabilities. Some things would have to be re-implemented in the Anaconda path (e.g. injecting the Ignition config).

  3. Anaconda is extremely powerful and overall makes it easier for users to shoot themselves in the foot. For example, partitioning is defined right there in the kickstart and trivial to get wrong. It's also true that Ignition allows arbitrary partition changes, but because of its design, it works from an existing partition table definition, it does not define one from scratch (so one only needs to express e.g. the new partition to add).

@Nemric
Copy link

Nemric commented Jan 12, 2024

Not sure to well understand everything you did write, my biggest concern is about metal live image (pxe boot) that let me consider FCOS like a disposable container that is a new born child (that can remember its previous life in /var ^^ ) at every reboot.
What about Anaconda and live OS ? Do I need to worry about it ?

@jlebon
Copy link
Member Author

jlebon commented Jan 12, 2024

There are no plans currently to stop supporting live PXE with persistent /var.

@cgwalters
Copy link
Member

A metal disk image means we can pretty much guarantee that the disk will boot. It's built exactly like we build the rest of our cloud images and we can subject it to lots of CI that carries more assurances than CI at the Anaconda level.

One thing this definitely relates to is rhinstaller/anaconda#5197 - which would lead more closely to a world where even if anaconda is being used, if your kickstart is just the bootc verb, anaconda is doing very little other than being a live ISO running podman effectively, which runs the target container image, i.e. the mkfs.xfs that is used lives in that container, not anaconda. This type of stuff adds reproducibility.

The trickier balance is around enabling more complex anaconda features like LVM and RAID1 while still deferring as closely as possible to the container.

@jlebon
Copy link
Member Author

jlebon commented Jan 26, 2024

I'm going to reply to this comment here since I think it belongs better here and that thread overall is about more than just Anaconda.

One topic threading through this is whether we aim to support the the "dd raw disk image to metal" for bootc installs. I personally am trying hard to back away from that because of my experience from the CoreOS side is that while it covers extremely well the 85% cases, and causes deep problems in the harder ones like iSCSI, multipath, etc.

Actually, thinking on this, neither iSCSI nor multipath are good examples. In both cases, whole block devices are involved, so there isn't really a mismatch with the image model. And on the configuration side, we're just using the same kargs and dracut code as in traditional systems. The only tricky bit is adapting our initramfs to not get in the way of existing functionality. Another way to say this is that both iSCSI and multipath are expected to be set up at installation time and require no involvement from e.g. Ignition.

RAID1, LUKS, and LVM are good examples that clash with the disk image model. Perhaps for RAID1 we could've done it at coreos-installer time to make the initramfs code simpler (though we already had the logic there for root reprovisioning, which was the bulk of complexity add). LUKS is an interesting case, because doing it from the initramfs means you can easily use it in all image-based platforms too (e.g. QEMU, OpenStack, but I've also seen people use NBDE in e.g. Azure).

While the root reprovisioning code in the initramfs is filesystem-based, (1) we benefit from that code being under our control, and (2) we benefit from starting from a known state. That said, one thing that'd help a lot I think is lifting all that stuff out of bash scripts and into e.g. rdcore (and on that topic, maybe we can try to share code with bootc's takeover install support).

All this to say that while it has its implementation warts, I think we've done quite well with the metal image overall and I'm not quite convinced it's time to move away from it. (I haven't talked at all about the UX here, which is a huge part of the story of course.)

Interested to hear what others think!

@cgwalters
Copy link
Member

Actually, thinking on this, neither iSCSI nor multipath are good examples. In both cases, whole block devices are involved, so there isn't really a mismatch with the image model

(disk image)

The only tricky bit is adapting our initramfs to not get in the way of existing functionality.

Right...but these are related because if we do installation via filesystem layout and not disk images, then we don't have a complex initramfs.

LUKS is an interesting case, because doing it from the initramfs means you can easily use it in all image-based platforms too (e.g. QEMU, OpenStack, but I've also seen people use NBDE in e.g. Azure).

The way I want to push this going forward is there are two paths:

  • Generating custom pre-built disk images from a container alongside desired storage layout; this can be set up for LUKS or LVM-and-LUKS etc. as desired without doing a complex dance in the initramfs
  • takeover installs where we treat the entire booted OS as just an initramfs; instead of carrying provisioning tools in the initramfs, one can use the full power of anything you have in a custom container image. That path can clearly support LUKS in the cloud equally well, albeit with an extra reboot or at least an extra userspace-only restart. But I think that's really fine because people doing that stuff are going for longer-lived instances.

While the root reprovisioning code in the initramfs is filesystem-based, (1) we benefit from that code being under our control, and (2) we benefit from starting from a known state.

I think what bootc install to-disk for example is also quite highly opinionated and under the OS writer's control - now to-filesystem adds a good bit more flexibility, but there's still a lot that is tested as a unit.

I think we've done quite well with the metal image overall and I'm not quite convinced it's time to move away from it.

In practice, we're clearly not going to just drop it in the near future. It's a question of emphasis though, once there are new alternatives.

@travier
Copy link
Member

travier commented Feb 14, 2024

  • Generating custom pre-built disk images from a container alongside desired storage layout; this can be set up for LUKS or LVM-and-LUKS etc. as desired without doing a complex dance in the initramfs

In the disk image case, LUKS requires re-provisioning in all cases, either when using null-cypher or current Ignition style re-provisionning.

  • takeover installs where we treat the entire booted OS as just an initramfs; instead of carrying provisioning tools in the initramfs, one can use the full power of anything you have in a custom container image. That path can clearly support LUKS in the cloud equally well, albeit with an extra reboot or at least an extra userspace-only restart. But I think that's really fine because people doing that stuff are going for longer-lived instances.

We can boot the system fully in RAM with a karg that I don't remember. Maybe the path forward is to move Ignition to using this mode for the first boot when doing complex filesystem re-provisioning instead of having all the logic in the initramfs. This would let us keep a "fast-path" for non-complex-storage Ignition configs while doing more complex storage setups not in the initramfs.

@cgwalters
Copy link
Member

In the disk image case, LUKS requires re-provisioning in all cases, either when using null-cypher or current Ignition style re-provisionning.

No, one can do online incremental re-encryption. That's not "re-provisioning" - it's a cheap operation that doesn't require moving the OS to RAM.

The reason we stopped doing that for the original RHCOS case is that it's a magical special case and we wanted to support generalized other partitioning too (including e.g. LUKS-on-RAID, switching filesystems etc.) in a consistent fashion.

But for the cloud case, it makes total sense to generate a cloud disk image that has a LUKS layout and then on firstboot start a cryptsetup-reencrypt operation (which could bind to the machine local tpm2, or more complex things).

Baremetal anaconda style installs can just set up the desired partitioning directly (as can also be done in a basic setup via bootc install).

We can boot the system fully in RAM with a karg that I don't remember. Maybe the path forward is to move Ignition to using this mode for the first boot when doing complex filesystem re-provisioning instead of having all the logic in the initramfs. This would let us keep a "fast-path" for non-complex-storage Ignition configs while doing more complex storage setups not in the initramfs.

Er...if we're running the root from RAM how would in-place OS updates and in general persistent data work?

@cgwalters
Copy link
Member

Stated simply: if we ship a first-class flow for the combination of three cases:

  1. generating custom (cloud/virt) disk images pre-set up with desired partitioning
  2. bootc-style takeover installs
  3. Anaconda

Then Ignition partitioning isn't necessary. The users who want to boot as quickly as possible in the cloud can choose 1; those who are OK with an extra reboot for longer-lived instances and don't want to maintain disk images can choose 2.

@cgwalters
Copy link
Member

To be clear though, in a way...because Ignition is such a central API to what FCOS is today, the above is more "container/bootc based Fedora", not FCOS. But then it also does make sense to think about crossover/intersection between the two, such as generating a FCOS-derived container and making a disk image from it, but still using Ignition at runtime for some configuration, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants