Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices for delivering container content that executes on the host #354

Open
cgwalters opened this issue Jan 27, 2020 · 14 comments
Open

Comments

@cgwalters
Copy link
Member

cgwalters commented Jan 27, 2020

We emphasize containers, but there are needs to execute code on the host. Package layering is one; it has some advantages and major disadvantages.

In OpenShift today, we have a pattern of privileged containers that lay down and execute some binaries on the host. This avoids the reboots inherent in layering (also doesn't require RPMs).

Recently however, this failed because running binaries targeting e.g. RHEL7 on a RHEL8 host may fail if they link to e.g. openssl. A general best practice here is really that the binaries need to be targeting the same userspace as the host. With e.g. statically linked Go/Rust type code one can avoid most issues but not all (and you really want to dynamically link openssl).

See this Dockerfile which pairs with this PR.

Further, I think we should move these binaries into e.g. /run/bin or so - if there's a higher level process that pulls the containers to a node on reboot (e.g. a systemd unit created via Ignition, or in the OpenShift case the kubelet), then having the binaries in /run helps ensure that if e.g. the container is removed, at least the binaries will go away on reboot.

That said...we may consider even shipping something like a /host/usr/bin/coreos-host-overlay install <name> <rootfs> tool that assumes the host rootfs is mounted at /host and handles the case where e.g. a container delivering host binaries is upgraded (or removed) before reboot.

(This problem set quickly generalizes of course to something like "transient RPMs")

@cgwalters
Copy link
Member Author

To be clear, the result of this could be some documentation; or it could be code. I think if we do nothing though, people are going to do manifestly bad things.

@cgwalters
Copy link
Member Author

Yet another reason I'd like these binaries in /run is that eventually I'd like to have the binaries that come with the host be signed and do something like enforce that any privileged code executed from a persistent storage comes from signed binaries.

@danwinship
Copy link

See also https://issues.redhat.com/browse/SDN-695 for some older thoughts specifically on the deploying-CNI-plugins angle. We definitely need something simpler than what we're doing now.

oh... this is an FCOS bug and that link probably isn't public. Well, the suggestion was to have some sort of config somewhere with like

cniPlugins:
  - name: cni-default
    sourceImage: quay.io/openshift/origin-container-networking-plugins:4.3
    rhel7Plugins:
      - /usr/src/plugins/rhel7/bin/*
    rhel8Plugins:
      - /usr/src/plugins/rhel8/bin/*
    plugins:
      - /usr/src/plugins/bin/*
  - name: openshift-sdn
    sourceImage: quay.io/openshift/origin-sdn:4.3
    plugins: 
      - /opt/cni/bin/openshift-sdn
  - name: multus
    ...

and then something would know how to pull the binaries out of those images and ensure they got installed correctly.

re /run/bin, there is trickiness with how CNI works which is terrible but we may need to have the multus binary be in its own directory without any other binaries in it (to avoid confusing cri-o about when the CNI plugin is ready), and we need all of the other CNI plugins to be in a directory without any non-CNI-plugin binaries (to avoid privilege escalation via multus). So anyway, we may need /run/multus/bin and /run/cni/bin or /run/bin/multus/ and /run/bin/cni/

@cgwalters
Copy link
Member Author

Yeah, /run/multus/bin is fine too.

@cgwalters
Copy link
Member Author

One thing we could do to generalize this is to have first-class support in ostree (and rpm-ostree) for transient package installs; this is strongly related to live updates except here we'd want the package install to skip the "persistence" step.

On the libostree side it'd be like ostree admin unlock a bit except we'd still keep /usr as a read-only bind mount. On the rpm-ostree side we'd need to more carefully keep track of the transient vs dynamic state; would likely involve two "origin" files, one in /run.

This would allow e.g. CNI to use what appears to be /usr/bin/cni or whatever, except it'd actually be on a tmpfs and go away on reboot.

@cgwalters
Copy link
Member Author

mrunalp suggested using podman mount for these cases, which would be really nice except we'd need to figure out SELinux labeling. Maybe we could force podman to mount everything as bin_t or so and just assume they're all binaries.

@smarterclayton
Copy link

Re: the config file, that file has to be dynamic. Who generates it?

@danwinship
Copy link

Yeah, I don't think it could actually be a single config file. It would be more like, the CNO takes every object of a certain CRD type and combines them together to generate the list of plugins to install.

In particular, one of the use cases was that we wanted it to be easier for third-party network plugins to install their CNI binaries without needing to know what directories we've configured cri-o to use, so in that case, no OpenShift component would know what plugin it wanted to install, so there couldn't be a single config file generated by an OpenShift component.

@danwinship
Copy link

(The original idea partly quoted above was that everyone would just add elements to an array in the network.config.openshift.io object, but that would be hard to coordinate well, and we don't want the admin to be able to accidentally break things by removing the wrong elements anyway)

@c3d
Copy link

c3d commented Mar 11, 2020

Just tom make sure I understand the idea correctly:

  • The privileged container sees mounts for some part of the host filesystem, e.g. /run as /host/run, and then copies some content there, e.g. /host/run/some-product/bin/some-product
  • The host then executes /run/some-product/bin/some-product and expects to locally find a suitable set of libraries, etc

I may be wrong, but I think that things would have increased chances of working right (e.g. the RHEL7 vs RHEL8 openssl binary example) if you added the libraries, i.e. if you also had a /run/some-product/lib that would be added to the LD_LIBRARY_PATH before running the binary.

Going further, and staying in a "container" spirit, I believe that you would practically need to do a chroot to /run/some-product and run the binary from there. Of course, that means you need to expose the relevant files in that chroot. But that means you would not need to copy to some other host location, and would solve the issue of removing the files when you remove the container.

So in the end, I think that what confuses me is:

  • Either you want the content to be transient and be removed if the container is removed, in which case it looks like running within the container environment itself, but possibly without dropping capabilities, etc, might be the safest route wrt. existing container build and testing practice (i.e. hiding details of the host OS like what particular version of openssl libraries are there)

  • Or you want the content to be persistent, which means the container is used to install something on the host, in which case you specifically don't want the files to be removed if the container is removed. However, I don't know how you would remove such files, except by putting them in a transient location (that could be /run) and cleaning up only on reboot.

So maybe to help me understand better, how do you see things running:

  1. Executing the container copies the files to the host, and the copied files remain available even after you remove the image
  2. Installing the container image provides the files, running the container "activates" the files, e.g. by modifying a search path to add the container image, or by adding host links into the container filesystem
  3. The container image provides the file in a standard way, and the container does not even need to run for the host to know where to find the files to execute

@cgwalters
Copy link
Member Author

Going further, and staying in a "container" spirit, I believe that you would practically need to do a chroot to /run/some-product and run the binary from there.

No. This model is intended for binaries that need to execute in the host's mount namespace or otherwise interact with things that run on the host. For example, kubelet, etc. No one should use chroot() in 2020 - use a real container runtime if your binaries can be containerized.

@danwinship
Copy link

Basically, this is not 'in a "container" spirit'; this is abusing containers as a distribution mechanism for host binaries.

@cgwalters
Copy link
Member Author

cgwalters commented Mar 11, 2020

I may be wrong, but I think that things would have increased chances of working right (e.g. the RHEL7 vs RHEL8 openssl binary example)

You're right about this; basically what one needs to do here if one wants to support multiple operating systems is to have e.g. a rhel7/ and rhel8/ subdirectory in your container, inspect the host's /etc/os-release and copy out the right binaries. That's what we ended up doing for the SDN. It definitely gets more complicated if you have library dependencies not in the host, but the hope is one avoids that. But, using LD_LIBRARY_PATH would work too.

That said, see #354 (comment) for the "use RPMs" path.

@cgwalters
Copy link
Member Author

This also relates to #401 which is more about persistent extensions - things one needs when the host boots, before kubelet, etc.

cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 19, 2020
Support read-only queries, but deny any attempts to make
changes with a clear error.

Now...in the future, we probably do want to support making
"transient" overlays.  See:
coreos/fedora-coreos-tracker#354 (comment)

Closes: ostreedev#1921
cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 21, 2020
Support read-only queries, but deny any attempts to make
changes with a clear error.

Now...in the future, we probably do want to support making
"transient" overlays.  See:
coreos/fedora-coreos-tracker#354 (comment)

Closes: ostreedev#1921
cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 21, 2020
Support read-only queries, but deny any attempts to make
changes with a clear error.

Now...in the future, we probably do want to support making
"transient" overlays.  See:
coreos/fedora-coreos-tracker#354 (comment)

Closes: ostreedev#1921
cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 23, 2020
Support read-only queries, but deny any attempts to make
changes with a clear error.

Now...in the future, we probably do want to support making
"transient" overlays.  See:
coreos/fedora-coreos-tracker#354 (comment)

Closes: ostreedev#1921
cgwalters added a commit to cgwalters/ostree that referenced this issue Mar 24, 2020
Support read-only queries, but deny any attempts to make
changes with a clear error.

Now...in the future, we probably do want to support making
"transient" overlays.  See:
coreos/fedora-coreos-tracker#354 (comment)

Closes: ostreedev#1921
cgwalters added a commit to cgwalters/fedora-coreos-config that referenced this issue Apr 8, 2020
See: coreos/fedora-coreos-tracker#354

Basically we want to support things like containers that contain
binaries designed to execute on the host.  These should really
be "lifecycle bound" to the container image.  Let's at least
have an obvious place to drop them that goes away on reboot.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants