Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Installer for Fedora CoreOS (bare metal) #50

Closed
dustymabe opened this issue Sep 12, 2018 · 31 comments
Closed

Host Installer for Fedora CoreOS (bare metal) #50

dustymabe opened this issue Sep 12, 2018 · 31 comments

Comments

@dustymabe
Copy link
Member

dustymabe commented Sep 12, 2018

Being that we are planning to boot from a common "image" on first boot in Fedora CoreOS we'd like an installer that can get that image onto a disk for a bare metal environment (cloud/VM environments should be using related image artifacts or pre-uploaded cloud artifacts). Anaconda can do this (i.e. write a pre-baked image to disk), but might be overkill for what we actually need considering we don't really want any customizations done by the installer and all of them performed by ignition on first boot. Container Linux in the past has used a small script (basically wrapping dd) as their installer.

Let's come up with a strategy for a host installer for Fedora CoreOS and implement it.

@bgilbert
Copy link
Contributor

Here's the CL install script. It's designed to be run on any Linux distribution. The script is on the verge of too clever to be written in bash, and can't do anything that would require exotic tools on the host. For those reasons, I've been in favor of rewriting it in Go for a while now (coreos/bugs#308).

A typical CL bare-metal install story is to PXE-boot CL with an Ignition config that runs coreos-install (which is shipped as part of CL) and reboots. A similar approach could work for FCOS. Note that that requires FCOS to support PXE to RAM, but that functionality will already be needed by a sizable number of users migrating from CL.

@ashcrow
Copy link
Member

ashcrow commented Sep 12, 2018

I'm in favor of spiking on the cl install script and seeing how to port it over to *cos. I was looking at it last night and it's actually really nice, simple, and has as much, if not more, gpg key than code 😆.

@Promaethius
Copy link

Promaethius commented Sep 14, 2018

Fedora Atomic PXE had a nice flow to it. rpm-ostree already generates an initramfs and could work cleanly with CoreOS/Matchbox for stateless deployments. As an added benefit, end users who already deploy matchbox (for example, https://github.com/poseidon/typhoon) would need minimal modifications.

@dustymabe
Copy link
Member Author

@jlelli has opened up an issue against coreos-assembler with a POC in it. There are several different ways we can move forward from here:

  • use anaconda environment (i.e. kernel/initrd but not the install code itself) like @jlelli has proved out
  • insert the %pre script @jlelli is into a kernel/initrd that we create from scratch in coreos-assembler

A secondary goal would be that whatever installer artifact we create (ISO image??) that it be able to be used as both an ISO installer and mounted up and used for PXE booting.

cc @ajeddeloh, since I know you have been interested in this space and have been exploring.

@jlelli - for next steps let's try to get together sometime in IRC (maybe next week) and brainstorm what path we want to take.

@ashcrow
Copy link
Member

ashcrow commented Nov 26, 2018

I believe @nhorman also has a POC of using coreos-install as well.

@nhorman
Copy link

nhorman commented Nov 26, 2018

@ashcrow yes, its available at https://github.com/nhorman/coreos-dracut

@dustymabe , to make sure everyone is clear, both @jlelli and my POC are capable of being used as an ISO for install or for pxe boot. Both solutions consist of an arbitrary kernel (ideally the coreos production kernel), and an initrd built against said kernel. The specific boot method used is derived from how those two build artifacts are packaged (i.e. the build environment for coreos could build the initrd with either of our solutions, and package them into a bootable iso, from which pxe users could extract the kernel and initrd, as I think we discussed on our last call)

@dustymabe
Copy link
Member Author

@dustymabe , to make sure everyone is clear, both @jlelli and my POC are capable of being used as an ISO for install or for pxe boot.

good to know. Thanks @nhorman

@ashcrow
Copy link
Member

ashcrow commented Nov 28, 2018

@nhorman and @jlelli will be getting together this week to discuss the outputs of their bare metal installer POCs and see if a recommendation is reachable.

@dustymabe
Copy link
Member Author

@nhorman and @jlelli will be getting together this week to discuss the outputs of their bare metal installer POCs and see if a recommendation is reachable.

is this something where @nhorman and @jlelli want to brainstorm before presenting to everyone else or should we all get together and brainstorm together? If the latter, would love to invite a group of others.

@ashcrow
Copy link
Member

ashcrow commented Nov 28, 2018

This is them comparing notes before presenting.

@dustymabe
Copy link
Member Author

@nhorman's POC issue against coreos-assembler: coreos/coreos-assembler#240

@nhorman
Copy link

nhorman commented Nov 28, 2018

@dustymabe I don't think any brainstorming is really needed, @jlelli and I just want to make a recommendation to you regarding which already presented bare metal installer option to use, so the RHCOS can collectively work on getting it integrated into your build pipeline

@yrobla
Copy link

yrobla commented Dec 5, 2018

I tried to use coreos/coreos-assembler#240 in a download coreos-maipo but so far i'm getting several errors on dependencies, and the generated images are not valid. Is that valid to use on those images ? what workarounds shall i have to take? Are there existing generated images already, that i could use for my testings?

@dustymabe
Copy link
Member Author

@yrobla I wouldn't recommend using this just yet as it is just a POC.

@dustymabe dustymabe added the meeting topics for meetings label Dec 5, 2018
@dustymabe
Copy link
Member Author

We discussed this ticket during the community meeting yesterday and @lucab and I had a discussion about it in #fedora-coreos afterwards.

I wanted to capture some of the conversation from those discussions. Basically, I think we are circling around a strategy where we use the same initrd for FCOS and for an FCOS install; IOW we create a single initrd that gets shipped. It might get embedded in multiple places, but it's the same initrd. With this strategy (and in general) we think we can get by with shipping the following artifacts:

  1. kernel
  2. initrd (contains installer)
  3. disk image (this is the single disk image we are trying to ship everywhere)
  4. an ISO with 1/2/3 embedded iniside of it
  5. obviously references to cloud images that can be used (i.e. AMIs, VHDs, etc)

So here the cases are:

  • Cloud
    • Use an AMI/VHD/etc...
  • Bare metal install, no PXE, no network
    • Grab the iso 4 and install from ISO drive redirection or burned CD
  • PXE install
    • Grab the kernel 1 and initrd 2 and run an install specifying a remote location where the disk image is
    • Optional: Grab 4 and host it in local infra and use the URL to that in your PXE config
  • Diskless PXE boot (no install, just run from network)
    • Grab kernel 1, initrd 2, and image 3 and boot directly from it over the network
    • The dracut man page seems to indicate we can just specify something like root=live:<url> and that the filesystem doesn't have to be tacked onto the initrd (thus we don't necessarily need a fat initrd, but can use a separate image). I don't yet know if we can just use the golden image we already created 3 or if it has to be in some other format.

The one corner case we've identified as not being handled in the current case (i.e. using a dracut module for the install and not using ignition at all during that process) is the case where the user may want to retrieve the disk image 3, during install from a server using a custom SSL certificate. i.e. we would need to get certs into the install environment in order to have trusted communication. In the past tectonic used ignition to do something like this. I don't know if dracut has the ability to handle a new cert setup at runtime (i.e. without requiring a new initrd to be generated first).

@nhorman
Copy link

nhorman commented Dec 6, 2018

@dustymabe I would not recommend shipping (1) and (2) as separate artifacts. Rather I think it would make more sense to simply ship the iso, and a script to mount/extract the kernel/initrd/image from the iso. That way all the elements for a given release ship together as one atomic unit.

@lucab
Copy link
Contributor

lucab commented Dec 7, 2018

The one corner case we've identified as not being handled in the current case (i.e. using a dracut module for the install and not using ignition at all during that process) is the case where the user may want to retrieve the disk image 3, during install from a server using a custom SSL certificate.

This was just a very quick example, I'd rather not focus on this specific detail. The same concern applies to a lot of similar knobs (HTTP proxy/authN configuration, client timeout/retries, checksum/signature verification, etc.). The general problem is that this builds a dedicated in-initramfs install flow which is parametrized over kernel cmdline (as opposed to the usual in-rootfs configurable via ignition-files).

Both dracut and anaconda have already been down that path and it reflects on their set of cmdline options (see dracut, anaconda).

@nhorman
Copy link

nhorman commented Dec 7, 2018

@lucab theres really no other choice however, If your install is going to have lots of options, you need a place to specify those options, and the kernel command line (or interactively) is the only way to fetch that information during the install.

@cgwalters
Copy link
Member

I think the argument he's making is that rather than having special arguments like inst.lang/inst.keymap you'd actually just have an Ignition script that runs localectl. Another example would be rather than inst.sshd you just use Ignition to enable the regular sshd.service.

That said the counter-argument here is...while I find myself agreeing with lucab's argument on the other hand...there aren't that many of these.

@dustymabe
Copy link
Member Author

rather than having special arguments like inst.lang/inst.keymap you'd actually just have an Ignition script that runs localectl

yeah - anything you could do after install, on firstboot via ignition, you would. we would not want to add arguments for those things.

I think he is referring to tunables inside the install environment. i.e. the only thing we need to do is be able to grab/verify the disk image but there may be things that need to be configured (ssl, retries, proxy, etc) to get the system to a state where it can do that. I think he is advocating for being able to perform those tunables via ignition rather than adding new kernel CLI args. The question there is: does it get confusing to have multiple "ignition runs"? One for install and one for firstboot afterwards? Maybe we just teach ignition how to fetch and dd a file to disk?

@nhorman
Copy link

nhorman commented Dec 7, 2018

@cgwalters @dustymabe those options that you're referencing (inst.lang/inst.keymap/inst.sshd) can be removed, they're part of the anaconda module in dracut, just set your dracut configuration at build time to exclude that module. The only modules that you have to include are those which the coreos installer dracut module depends on (network/url-lib/kernel-modules/busybox/systemd/systemd-initrd I think is the list).

you will have to accept all the options that those modules come with, as getting rid of those will mean re-inventing most of dracut, and for most part, you are going to want those options because they relate to configuring the network, which you will need in the pxe install use case (i.e. if the system to be installed can only reach the server holding the rhcos image over a vlan, you need the ability to configure a vlan, which is a command line driven option). Beyond that however, the only options that the coreos-dracut module adds are inst.install_image inst.dest_dev and inst.ignition_url

As for ingnition runs, there is only one at the moment. The installer does exactly what I read the coreos-installer script does. That is to say, it downloads the ignition config, and places in in a predetermined directory for the first boot of rhcos to pick up and execute.

@dustymabe
Copy link
Member Author

@dustymabe
yeah - anything you could do after install, on firstboot via ignition, you would. we would not want to add arguments for those things.

@nhorman
@cgwalters @dustymabe those options that you're referencing (inst.lang/inst.keymap/inst.sshd) can be removed, they're part of the anaconda module in dracut, just set your dracut configuration at build time to exclude that module

right. I think colin and I were trying to point out that most of anaconda's args won't be needed because we will have ignition that will run after the install. I think we're all in agreement there.

@dustymabe
I think he is referring to tunables inside the install environment. i.e. the only thing we need to do is be able to grab/verify the disk image but there may be things that need to be configured (ssl, retries, proxy, etc) to get the system to a state where it can do that. I think he is advocating for being able to perform those tunables via ignition rather than adding new kernel CLI args. The question there is: does it get confusing to have multiple "ignition runs"? One for install and one for firstboot afterwards? Maybe we just teach ignition how to fetch and dd a file to disk?

Here ^^ I was trying to clarify luca's point that we will need to add some options, but not all options anaconda would have needed in the past.

@nhorman
Copy link

nhorman commented Dec 7, 2018

ok, thats fine, when you integrate this to your build pipeline, just add this line to the dracut.conf file:
omit_dracutmodules+=anaconda

That will remove the anaconda module from the dracut initrd , and in so doing will remove these options from the commandline:
https://fedoraproject.org/wiki/Anaconda_Command_Line_Options

@dustymabe
Copy link
Member Author

+1 - I don't think we even need to do that since we are planning to use the same initrd from FCOS and we don't install the anaconda software in FCOS.

[root@coreos modules.d]# find /usr/lib/dracut/modules.d/ | grep anaconda
[root@coreos modules.d]#

@nhorman
Copy link

nhorman commented Dec 7, 2018

no, you dont need to remove anything, but you may want to just to save the space taken up by the anaconda module in the initrd

@dustymabe
Copy link
Member Author

Will investigate the strategy from #50 (comment) over in #91

@dustymabe dustymabe removed the meeting topics for meetings label Dec 19, 2018
@fzdarsky
Copy link

  1. disk image (this is the single disk image we are trying to ship everywhere)

@dustymabe does this actually refer to a disk (block) image like a qcow2/raw that would get dd'ed to disk or a file image like a tarball that'd get untar'ed/cp'ed to disk?
I'm asking because we'll sooner or later want to support user-configured logical volumes, partitions, etc. to deal with different #/size/type of disks per node, meet security compliance, etc. and the latter would make this so much easier.

@dustymabe
Copy link
Member Author

@dustymabe does this actually refer to a disk (block) image like a qcow2/raw that would get dd'ed to disk or a file image like a tarball that'd get untar'ed/cp'ed to disk?

disk/block image

I'm asking because we'll sooner or later want to support user-configured logical volumes, partitions, etc. to deal with different #/size/type of disks per node, meet security compliance, etc. and the latter would make this so much easier.

The plan here is that every install of FCOS will have the same starting point (same disk/filesystem setup) on first boot. Based on the provided ignition config additional disk/filesystems can be set up. This is how Container Linux operates today.

@fzdarsky
Copy link

@dustymabe thanks for clarifying. Some use cases which we're frequently encountering are security compliance (e.g. segregating /var/log, /tmp, /home etcd into separate partitions/volumes, encrypting /home, etc. as prerequisite for going into production) or having a single large disk and wanting to use the space not used by the host OS for, say, Ceph OSDs. I'd just would like to ensure users don't have to jump through hoops (like creating new images) to enable these use cases.

@cgwalters
Copy link
Member

cgwalters commented Dec 20, 2018

does this actually refer to a disk (block) image like a qcow2/raw that would get dd'ed to disk or a file image like a tarball that'd get untar'ed/cp'ed to disk?

The latter is a bit like what https://github.com/ostreedev/ostree/ is - except it's way more sophisticated than a tarball. So yes, you can apply the same OSTree "commit" on any filesystem+block storage layout you choose (plain ext4, BTRFS, XFS on LVM, XFS on LVM on dm-crypt; whatever). It worked that way since the start with Fedora/RHEL Atomic Host, which used Anaconda.

That said, while we have no plans to throw away that flexibility, I think our emphasis here is (for a lot of good reasons) on shipping a dd-able image. The OSTree layer will be used for efficient in-place updates.

@dustymabe
Copy link
Member Author

i'm going to close this out since we now have installer ISOs that can be used for ISO installs and PXE installs: see #91 (comment) for more context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants