Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to support installing kernel modules #249

Open
miabbott opened this issue Aug 8, 2019 · 17 comments

Comments

@miabbott
Copy link
Contributor

commented Aug 8, 2019

Users may have a need to install kernel drivers on there hosts to support additional hardware. This could be required for boot (day 1 operation) or could be required after install to enable adapters (day 2 operation).

The straight-forward way to accomplish this is to package the drivers in RPM format, so that they can be installed via rpm-ostree install. Users may want to be able to build these drivers on an FCOS host, which would require a container with the necessary dependencies installed.

It would be useful to come up with a framework that is generic enough to be reused by multiple drivers and is possible to produce multiple versions of the driver (per kernel version).

Copying notes from @cgwalters below:

There are conceptually three phases, that are linked: how modules are built, how they're delivered, and finally how they're installed on the host. As I noted elsewhere I think we should make it easy to have a single container image supporting multiple kernel versions. Delivery would be something like /usr/lib/modules/$kver/foo.ko with multiple $kver in the contaienr. How they're installed gets tricky if we want to integrate with upgrades. Perhaps the simplest thing is to have RPMs of each kernel module that Require: their exact target kernel. Then the container content is provided to the host, we inject /etc/yum.repos.d/kmods-$provider.repo that points to it, and do rpm-ostree install kmod-$provider. Then on upgrade rpm-ostree will try to pick the right one, and fail if it's not available.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

A hugely tricky question here is whether 3rd parties will want a mechanism that also works nearly the same for yum managed systems as well - how tolerant will they be of a distinct mechanism for FCOS? It may depend.

One thing I mentioned in the Silverblue+nvidia discussion is we could add rpm-ostree support for arbitrary hooks run during upgrades. Today %post from installed RPMs are constrained, but something like /etc/rpm-ostree/hooks.d that were passed as an argument the new target rootfs. That would allow near-total flexibility because that hook could just run a container that did whatever it wanted, from building a module to checking for a pre-built one; if a hook exited with failure that would also block the upgrade.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 8, 2019

One useful pattern then would be having in Kubernetes a daemonset container inject its hook on startup into the host, ensuring that it got executed when an upgrade was attempted.

@dustymabe

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

The easiest/cleanest approach is to have all kernel modules built for every kernel and provided via an rpm that requires that kernel. For example someone could set up a copr that triggers on every kernel build and builds a related kernel module rpm for that kernel. Then adding the yum repo and rpm-ostree installing the rpm should suffice, correct?

It's a lot uglier when we have to recompile on upgrade on the host. Especially when that host is supposed to be minimal (hence why you need to do it in a container).

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

coreos/rpm-ostree#1882 is a quick hack I started on the hooks thing.

@bgilbert

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

Then on upgrade rpm-ostree will try to pick the right one, and fail if it's not available.

@lucab If an upgrade fails, will Zincati retry later, or give up immediately? This seems like a case where a later retry might succeed.

@lucab

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

Zincati will keep retrying after some delay, both when trying to stage (i.e. deploy --lock-finalization) a new release and when trying to finalize (i.e. finalize-deployment) a deployment (which it has previously successfully locally staged).

@jlebon

This comment has been minimized.

Copy link
Member

commented Aug 13, 2019

The easiest/cleanest approach is to have all kernel modules built for every kernel and provided via an rpm that requires that kernel. For example someone could set up a copr that triggers on every kernel build and builds a related kernel module rpm for that kernel. Then adding the yum repo and rpm-ostree installing the rpm should suffice, correct?

I think I agree with this. It works just as well on FCOS/RHCOS as on traditional yum/dnf-managed systems. In the context of immutable host clusters, it makes more sense to me to build the kernel module once than have e.g. potentially thousands of nodes all compiling them on each upgrade. Not just for efficiency, but also for keeping down the number of things that could go wrong at upgrade time.

The flip side of this though is that we're then on the hook (pun intended) to provide tooling for this. Not everyone can use COPR. For RHCOS... maybe what we want is a way to hook into the update payload delivery flow so one can work on top of the new machine-os-content similarly to openshift/os#382?

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

For example someone could set up a copr that triggers on every kernel build and builds a related kernel module rpm for that kernel. Then adding the yum repo and rpm-ostree installing the rpm should suffice, correct?

Yeah, this is a fine approach.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 21, 2019

A slightly tricky thing here though at least for RHCOS is I'd like to support shipping the kernel modules in a container via e.g. daemonset - this is a real-world practice. Doing that with the "multi-version rpm-md repo" approach...hm, maybe simplest is actually to write a MachineConfig that injects the .repo file, and run a service that hosts the rpm-md repo.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

Been thinking about this a lot lately and we've had a ton of discussions and the usual pile of private google docs. I want to emphasize how much I have come to agree with
Dusty's comment.

One issue with this is that is that we don't have any direct package layering support in the MCD; we'd probably have to document dropping the /etc/yum.repos/d/nvidia.repo file and rpm-ostree install nvidia-module or whatever via a daemonset. But in the end that gunk could be wrapped up in a higher level nvidia-operator or whatever.

@lucab

This comment has been minimized.

Copy link
Member

commented Aug 22, 2019

For reference, here is how people use to bring nvidia & wireguard modules to CL on k8s: https://github.com/squat/modulus

@cgwalters

This comment has been minimized.

Copy link
Member

commented Aug 23, 2019

OK now I got convinced in another meeting that:

  • Exposing RPMs to users is too raw
  • Requiring a new service to build and maintain the RPM repo as kernel updates come in is not obvious

The core problem with atomic-wireguard and similar CL-related projects is they don't have a good way to do the "strong binding" I think is really important, to again block the upgrade if the kernel module won't work with the new kernel.

So that seems to take us back to coreos/rpm-ostree#1882
which will be generally useful anyways.

cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Aug 23, 2019
WIP: Add /etc/rpm-ostree/roothooks.d
This is intended to support kernel module systems like
[atomic-wireguard](https://github.com/jdoss/atomic-wireguard).
See the Fedora CoreOS tracker issue:
coreos/fedora-coreos-tracker#249

With a "roothook", one can perform arbitrary modifications to the *new* root
filesystem; if a hook exits with an error, that also stops the upgrade.

Specifically with this, atomic-wireguard could *block* an upgrade if
the new kernel isn't compatible with a module.
cgwalters added a commit to cgwalters/rpm-ostree that referenced this issue Sep 2, 2019
WIP: Add /etc/rpm-ostree/roothooks.d
This is intended to support kernel module systems like
[atomic-wireguard](https://github.com/jdoss/atomic-wireguard).
See the Fedora CoreOS tracker issue:
coreos/fedora-coreos-tracker#249

With a "roothook", one can perform arbitrary modifications to the *new* root
filesystem; if a hook exits with an error, that also stops the upgrade.

Specifically with this, atomic-wireguard could *block* an upgrade if
the new kernel isn't compatible with a module.
@dustymabe

This comment has been minimized.

Copy link
Member

commented Sep 4, 2019

OK now I got convinced in another meeting that:

  • Exposing RPMs to users is too raw
  • Requiring a new service to build and maintain the RPM repo as kernel updates come in is not obvious

hmm. exactly who are we concerned about exposing things to? Is it end users or is it module producers? For example, with wireguard we could work with the maintainer and set up one project that does the building of the rpms and creation of repos for each new kernel. So we expose the pain of the "build service" to one person (or small group of people) and the end users don't have pain. The end users simply add the yum repo and rpm-ostree install the rpm and it should work from then on.

@imcleod

This comment has been minimized.

Copy link

commented Sep 11, 2019

Dusty, I’m largely responsible for the back and forth on this so I’ll try to re-frame a bit here.

I’ll summarize one proposal in two points. To use an out of tree module on *COS:

  1. The module must be packaged as an RPM, able to be rebuilt against a specific kernel and result in an RPM that has a hard dependency on the kernel it is compiled for.
  2. Something must be responsible for maintaining a repo (public, cluster-local, org local, etc.) that is populated with these compiled RPMs for new *COS releases as they come out, and before the updated release becomes the target for a node update, or cluster update.

I’ve no doubt that if the two conditions above are met, the resulting behavior at the *COS level will be robust, bordering on bulletproof. Nothing prevents the community from trying to move forward with this.

I have two concerns.

Firstly, the existence of 2) above is problematic. In the product context (by which I mean OpenShift running on RHCOS) I’m getting hard pushback on the idea of introducing a new service/container that is responsible for hosting such a repo, and updating it with fresh RPM builds as needed, in coordination with the updates of the underlying *COS kernel. I don’t know what else to say on this point, other than that if we don’t have this repo, we do not have this solution.

My deeper concern is with point 1) above. Put bluntly I suspect that if we require RPM-ification as a prerequisite for third party modules on *COS, we will get far fewer third party modules on *COS.

To be clear, I’m not saying that it’s not possible to rpm-ify all desirable modules. What I am saying is that it’s extremely unlikely to happen organically. It has had plenty of time to happen organically on Fedora and RHEL and has not. There are very good tools and approaches that can be used to do this with RPMs and they come with many of the same advantages that the proposal outlined above would give. In spite of this, after over a decade and a half of RHEL and Fedora, some kernel third party modules are RPM-ified but many are not.

If, as I fear, it doesn’t happen organically, it will not happen. We simply do not have the bandwidth in the *COS teams and the broader community to maintain these SPECs and supporting scripts on our own, nor do we have the deployed base to provide the incentive to third parties to adopt this approach. (Again, if Fedora/RHEL/CentOS can’t drive this, how will we?)

What has happened organically in the kube/container space are variations on the approach best represented by Joe’s work on wireguard. I’d summarize this as:

  1. Take the third party kernel material in whatever form it is currently delivered.
  2. Automate the rebuild step, either within a container build task, or within a running container, using scripting of whatever mechanism is most appropriate for the material as delivered.
  3. Define a minimal API-like interface to interact with these containers. Essentially: build, load, reload, unload and possibly “build for this pending kernel and err if it fails”

This is substantially less prescriptive than RPMs plus package layering and has the advantage of being container-native-ish and uses packaging/bundling techniques with a much larger user base (container builds and running containers).

Thoughts?

@ashcrow

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

If, as I fear, it doesn’t happen organically, it will not happen. We simply do not have the bandwidth in the *COS teams and the broader community to maintain these SPECs and supporting scripts on our own, nor do we have the deployed base to provide the incentive to third parties to adopt this approach. (Again, if Fedora/RHEL/CentOS can’t drive this, how will we?)
What has happened organically in the kube/container space are variations on the approach best represented by Joe’s work on wireguard

👍

This is substantially less prescriptive than RPMs plus package layering and has the advantage of being container-native-ish and uses packaging/bundling techniques with a much larger user base (container builds and running containers).

I agree with this. As noted, there isn't anything wrong with RPMs, package layering, etc.. in fact they are quite powerful .... but I tend to believe using OCI containers + builds has less friction as it already has uptake.

@dustymabe

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

Put bluntly I suspect that if we require RPM-ification as a prerequisite for third party modules on *COS, we will get far fewer third party modules on *COS.

I figured most things that people in the Fedora/RHEL/CentOS ecosystem care about can already be delivered as an rpm. I didn't know this was that big of a blocker.

  1. Take the third party kernel material in whatever form it is currently delivered.
  2. Automate the rebuild step, either within a container build task, or within a running container, using scripting of whatever mechanism is most appropriate for the material as delivered.
  3. Define a minimal API-like interface to interact with these containers. Essentially: build, load, reload, unload and possibly “build for this pending kernel and err if it fails”

Regarding steps 1/2 that exactly what I was proposing we do on the build side somewhere and then the output of that process would be rpms that could then be consumed. I think my whole point here is that it would be much cleaner to do it this way than it would be to add hooks to execute things on the host (that may or may not fail) that then modify the host on every upgrade.

I think you've laid out a few points about why it's too hard to do it that way.

@cgwalters

This comment has been minimized.

Copy link
Member

commented Sep 12, 2019

My deeper concern is with point 1) above. Put bluntly I suspect that if we require RPM-ification as a prerequisite for third party modules on *COS, we will get far fewer third party modules on *COS.

As I've said, I am quite sure it'd be easy for us to provide a container image which accepts kernel module sources (or - potentially a pre-built module) and generates an RPM.

but I tend to believe using OCI containers + builds has less friction as it already has uptake.

But that doesn't solve the binding problem on its own. We're talking about kernel modules which execute fully on the host, so saying "OCI containers" is deceptive as it's really host tied. There's some blurry lines here about how much containers are used, but it's not just containers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants
You can’t perform that action at this time.