Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider deleting userdata from provider after Ignition completes #1315

Closed
bgilbert opened this issue Jan 29, 2022 · 11 comments · Fixed by #1350
Closed

Consider deleting userdata from provider after Ignition completes #1315

bgilbert opened this issue Jan 29, 2022 · 11 comments · Fixed by #1350

Comments

@bgilbert
Copy link
Contributor

bgilbert commented Jan 29, 2022

Feature Request

Environment

At least VirtualBox, VMware

Desired Feature

Some platforms allow modifying or removing userdata using APIs accessible from the VM instance. On those platforms, consider deleting the Ignition config from the platform after Ignition completes.

Other Information

Ignition configs may contain sensitive information that should not be accessible from workloads running on the machine. We currently do nothing to help users with sensitive userdata, and there is no documentation about the issue (coreos/fedora-coreos-docs#306). On platforms with network metadata services, the user can write an Ignition config that firewalls off the metadata service (not documented in Fedora CoreOS: coreos/fedora-coreos-docs#247). On platforms with in-hypervisor metadata accessible by non-root users (at least VMware), firewalling isn't an option, and the user must delete the metadata themselves.

Various minimalistic fixes are possible. We could publish docs on adding a systemd service that disables userdata access. If an external tool is necessary, we could ship it or recommend adding it to the OS. Or we could recommend that OSes check the platform ID and disable userdata access themselves.

However, we generally try to keep Ignition secure by default. I think it would make sense to delete userdata automatically wherever we have an API to do so. This is most important for platforms that can't be remediated by firewalling, but is still a worthwhile improvement on the others. On platforms where we can't delete userdata, we could log a warning recommending that users firewall or otherwise mitigate their risk.

In principle this would be a compatibility break, since it's possible for other software in the instance (or on a VM host) to intentionally access userdata. (That would be hacky, since the userdata must be formatted as an Ignition config for Ignition to care about it, but it's conceivable.) If we're concerned about this, we could add an Ignition config field (ignition.config.delete?) configuring whether Ignition performs the deletion. I think the default should certainly be true after the next major spec bump, but the short-term default is less clear, especially for old specs that don't have the field.

We'd need to delete userdata after Ignition completes, not after fetch succeeds. There are two reasons: kargs changes may reboot the machine and refetch userdata, and a failed Ignition run may be retried by the OS (since Ignition is mostly idempotent).

Platform notes

AWS

Network metadata service. Userdata can be changed with the ModifyInstanceAttribute control plane request, but there's no instance-side API. Control plane access is probably out of scope for us.

Azure

On FCOS and RHCOS, Azure "custom data" access is already disabled at runtime, outside Ignition's control. Azure doesn't allow full access to a VM until it "checks in" to report that it has provisioned successfully, and this checkin causes Azure to disable access to the config drive. Checkin is handled by Afterburn, which runs in the initramfs on RHCOS (to avoid long delays for console access) and in the real root on FCOS (for eventual integration with automatic rollback: coreos/fedora-coreos-tracker#47). It may or may not make sense to move this into Ignition.

The new Azure IMDS userdata mechanism (#1351) does not act this way. The instance can access userdata even after checkin, and there's no instance-side API to disable it.

DigitalOcean

Network metadata service does not appear to support changing userdata.

GCP

Network metadata service does not appear to support changing regular metadata attributes via the metadata service. Guest attributes can be changed from the guest, but the user-data attribute is not one of those. Some additional discussion in coreos/fedora-coreos-docs#306.

Packet

Network metadata service does not appear to support changing userdata.

qemu

fw_cfg userdata is only accessible by root. Write support was removed in qemu/qemu@023e314.

We also support config drives on non-x86_64.

VirtualBox

Guest properties can be modified from inside the guest. They're documented as being accessible only to root, but it still seems useful to remove userdata.

VMware

Userdata is accessible by non-root users. It can be stored directly in guestinfo or in an XML document in guestinfo, and we'd have to handle both cases. See #1300.

@jlebon
Copy link
Member

jlebon commented Jan 31, 2022

Torn on whether we should do this by default or make it opt-in too. The fact that it's also only possible on some platforms is a concern.

Maybe cleanest is a delete key which defaults to false, but if set to true, then we delete it on platforms where we can, but otherwise clearly error out where we can't. That way, users aren't misled into thinking their secrets are safe and can take alternative measures (e.g. firewalling or using platform APIs with higher creds).

@bgilbert
Copy link
Contributor Author

I don't like the idea of making a security feature opt-in, even if we can't support it everywhere. But also, as a practical matter, Butane and Ignition configs aren't inherently platform-specific. (CT generated Ignition configs that were, and it always felt awkward.) For any deployments that use the same config on dev and prod platforms with different support for this option, we'd be forcing them to either maintain separate configs or accept the lower-security option in both places.

I'm leaning toward deleting configs by default, maybe even without an override. I can construct hypothetical cases where someone might need the config left in place, but none of them feel particularly natural. We can always go back and add an override if it turns out someone needs it.

(Alternative: we could have a tri-state, keep/delete/try-delete, defaulting to try-delete.)

@jlebon
Copy link
Member

jlebon commented Feb 1, 2022

I don't like the idea of making a security feature opt-in, even if we can't support it everywhere.

I agree, though my concern is that latter part of that sentence. We could log a warning as you suggested, though that'd be easy to miss and wouldn't be accurate because the user might've taken appropriate measures already and we have no good way to detect that.

But also, as a practical matter, Butane and Ignition configs aren't inherently platform-specific.

True, but whether your config holds secrets is config-specific. :)

(Alternative: we could have a tri-state, keep/delete/try-delete, defaulting to try-delete.)

Hmm, yeah that could work.

For platforms in which we can only firewall, maybe we could for FCOS/RHCOS have Butane sugar to make that easier. E.g. have it create a stamp file and then add a systemd generator to https://github.com/coreos/fedora-coreos-config which sets up the rules for the specific platform we're on?

@cgwalters
Copy link
Member

As openshift/enhancements#443 notes, the OpenShift SDN currently blocks link-local addresses which is the main mitigation against this from an OCP perspective.

However, note that there definitely exist containers which expect to be able to retrieve user data; those containers today can join the host network namespace to skip the SDN firewalling. If the firewalling is at the host level they'd have to go through further hoops to e.g. remove the iptables rules inside their own netns.

I found kubernetes-sigs/cluster-api-provider-aws#1875 in my web history which is related too.

@bgilbert bgilbert added the jira for syncing to jira label Feb 4, 2022
@bgilbert
Copy link
Contributor Author

bgilbert commented Feb 4, 2022

@jlebon and I discussed this more OOB. Some (reconstructed) notes:

  • We've never published any guidance on secret storage. The obvious way to do this "properly" with Ignition is to put secrets in child configs hosted on a special server, which deletes them once provisioning is complete. That requires a) the parent config to be unique for each individual node, b) the secrets to be stored at a unique URL for each node being provisioned, and c) Ignition to inform the server once provisioning is complete.

    We could build a component that acts as the server (as a deployable service; it's not obvious how to do it securely as a hosted one), but the user's infrastructure would still have to handle (a) and (b), so probably a dedicated server component is not helpful enough to be worthwhile. Maybe this functionality would make more sense integrated into tools like Matchbox. (c) could be implemented by having the config server send an HTTP reply header giving Ignition a callback URL to hit once provisioning is complete.

    On the other hand, tools for cluster-level secret storage already exist and maybe we shouldn't reinvent the wheel. People are putting secrets in configs, though, so we shouldn't ignore this issue entirely.

  • There's a legitimate use case for not deleting configs: nodes might be reprovisioned and expect to re-fetch the Ignition config from userdata. For example, VMware nodes might run entirely from live PXE, so every boot would need to fetch the Ignition config. I don't think this is a common use case today - for example, PXE nodes probably use ignition.platform.id=metal ignition.config.url= - but it's a supported one. So we probably can't just implement try-delete without an override.

  • We didn't really make progress on the schema question. I don't think Butane firewall sugar will work (Butane would have to know both the platform ID and the right way to do firewalling for the user's cloud orchestrator), and suspect defaulting to delete will just cause everyone to set try-delete without actually creating the firewall rules we want to encourage. @jlebon doesn't want to implicitly encourage users to skip the needed firewall rules on platforms where that matters.

  • Userdata deletion should notionally happen when firstboot is marked complete, which is the OS' responsibility. It seems safe to do the deletion first (Ignition will just no-op if it reruns) but that might be confusing.

  • The technical implementation of userdata deletion is straightforward and best done in the Ignition codebase. In particular, on VMware we sometimes have to modify an XML document and write it back to the guestinfo property. In the short term, it might make sense to implement config deletion as a separate tool (or multicall binary) in this repo and let the OS invoke it, applying whatever policy it wants (perhaps driven by a config file written by the Ignition config). That lets OSes evolve the policy over time without encoding policy in a stable spec. Once we have a better sense of the tradeoffs, we can always drop the separate binary and have Ignition implement a policy directly.

@bgilbert
Copy link
Contributor Author

bgilbert commented Feb 4, 2022

@cgwalters

However, note that there definitely exist containers which expect to be able to retrieve user data

Really? Why?

@cgwalters
Copy link
Member

Really? Why?

I think in some cases e.g. kubelet uses this to determine which availability zone it's in. It's not about accessing the user data, but more about instance metadata in general; an IP-level firewall can't block one but not the other.

There's definitely also 3rd party containers which read instance metadata.

I hesitate to say it, but the real solution may need to be something like a transparent proxy which knows about each cloud and filters out just the user data to start. i.e. we use iptables to redirect requests to the proxy process.

@cgwalters
Copy link
Member

(My initial comment mentioned "user data" when I definitely meant "instance metadata")

@pothos
Copy link
Contributor

pothos commented Apr 22, 2022

With #1351 the situation for Azure changed when IMDS is used which will stay available for all clients on the instance (but the docs call out that secrets shouldn't be placed there).

@bgilbert
Copy link
Contributor Author

Thanks, updated the description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants