-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encrypted image can be pulled on the host but decrypted in the guest, revealing secrets #162
Comments
I don't think policy by itself solves the problem though it would be a way to express what is expected by way of mounts. We certainly need a test for scenario you have uncovered and a lot more tests for scenarios where we manipulate the way the host interacts with the guest and tries to change what resources are provided to the guest. |
@fitzthum, please, do not take my comment as "this is not an issue", that's not the case but I must add the following sentence for the sake of understanding the problem, and ir order to try to narrow down the problem, and understand what would be a good mitigation plan. Using the non-forked containerd with the Confidential Containers, as of right now, is not a supported scenario. With that said, I'd be super interested if we can reproduce the issue with the not-yet-released "pulling on the guest" approach. One thing that I'd like to ask you to provie, if possible, is the QEMU command-line used to start the pod sandbox, as it sounds like we may be doing something wrong there. I'm mostly interested to understand a few things:
|
Yes, a host that uses this attack would technically be using CoCo in an unsupported way. We currently have no way to prevent them from doing so, which is what makes this dangerous. This also hints at why our tests did not catch this. We haven't implemented any tests that cover improper installations. We do have a test that makes sure the image is not pulled on the host, but we've only run it with the forked containerd. In the future we should probably think more about doing more "adversarial" testing on the APIs that are exposed to untrusted components. No matter how the host is configured, we can't expose guest secrets to it. I phrased this issue in terms of the forked containerd, which suggests that it is more of a problem with the old approach, but in fact I discovered it when testing the latest Kata CI bundle. You will run into the problems described here with the new approach with upstream containerd if you simply don't setup the snapshotter, which happens to be exactly what the operator does right now since we haven't merged the nydus support to the operator yet. Even once that is merged the host can simply disable the snapshotter without the guest noticing. I have not tested with the snapshotter enabled. When we use nydus with integrity protection, the rootfs should be covered by dm-verity. In theory that won't be affected. I'm not sure what will happen when we use the snapshotter to pull inside the guest. Like I said, if the host disables the snapshotter we will have a problem. Here is the QEMU command
I think the significant part is probably
Since the QEMU command line is not measured and the agent APIs that result in mounting things are not measured, we have very little control of what ends up mounted in the guest. |
Yeah, setting |
Wouldn't this break the host-pulling stuff? I don't think setting the I haven't tested with peer pods, but I am assuming that it wouldn't be affected because the host can't really mount things to the guest regardless of configuration. That said, it might actually be possible to execute a similar attack over the network. I would like to totally remove the shared fs from CoCo, but I had thought there were still some requirements, such as some of the network configs. I guess peer pods must have a workaround for this or maybe I am misremembering. Either way I think it would be good to cut down on sharing. That said, there might be some legitimate uses of sharing, such as the host-pulling or some potential confidential storage implementations. |
Not sure.
The operator should be responsible for always reconciling the configuration file. However, I can clearly see an issue of someone adding, for instance, Regardless of the QEMU command line, can we have the Kata Containers / containerd configuration files to be validated as part of the measurement? |
I don't think we really have hardware support for this. Our best bet will probably be validating things from inside the guest, with the assumption that a malicious host could override the operator and make the shim do absolutely anything. |
So, I was having a convo with @stevenhorsman and we think there a few things that would be good to try.
Most likely a combination of those 3. |
I'm not entirely following what is happening here. If vanilla containerd is used, it will try to pull and unpack an encrypted image, which it can't handle. Even if vanilla containerd pulls the container image, surely kata-agent does not attempt to decrypt payloads coming from the host like this? Going further - |
This seems to be exactly what is happening. It's kind of hard to believe. It might be good if someone replicated what I describe just to make sure I didn't dream the entire thing. Keep in mind that even if we pull the image inside the guest, if the host can create shard directories that overlap with the rootfs, we have a problem. The behavior of the pause image might be an example of this. I still need to figure out exactly how we can pull on the host but unpack on the guest. I suspect that those actions are simply driven by different agent endpoints. I don't think measuring stuff that lives on the host is going to be feasible. I have a few vague ideas of solutions but none of them are very optimal.
|
I don't think a change to the kata config file can help here, simply because the config file is not measured. The problem to me seems much more general and probably needs to be addressed differently:
Encrypted rootfs is an example, but a malicious shim could most likely, for example, convert a mountpoint that was supposed to be a tmpfs inside the VM with a shared folder, and this way access data that was not supposed to go outside of the VM. There are probably different types of attack that could be generated by creating additional mountpoints, even without a shared filesystem. |
So, let's take a step back, and please, bear with me. All the possible attacks have one specifc root cause: a "misconfiguration" on the host OS. Is this affirmation correct or incorrect? If this information is correct ... If this information is incorrect ... I will ask more questions after I get the answers for those. |
Ok, I have unraveled the mystery a bit. Like @jepio I didn't think it was very plausible that the image could be downloaded on the host but decrypted inside of the guest. It turns out it isn't. The image pulling and unpacking are atomic as they should be. This isn't to say that we don't have a problem here. We still have a huge issue with unvalidated filesystem sharing. To exploit this attack vector, though, will require a malicious containerd (or shim), rather than simply installing the upstream one. I hope to provide an example of this soon. So why were multiple people able to consistently run an encrypted image that was pulled on the host? It turns out that Kubernetes was doing some very sneaky caching. The encrypted and unencrypted test images have the same content. It turns out that in the CI and in local manual testing, if the unencrypted image had already been downloaded on the host, then the encrypted image would not be downloaded (even though containerd would say that it was downloaded and even though the image pull policy was set to always). I'm not yet sure exactly where this caching is happening. Hopefully I can confirm tomorrow. This does reveal a subtle issue with our CI. We should probably update these images so that they don't have identical rootfs otherwise we could pick up false positives. So in short, this is an unvalidated sharing issue rather than an image pulling issue. The biggest thing to check is whether a malicious shim can ask the agent to create a shared mount on top of the rootfs when the container is pulled in the guest. I will try to check this directly tomorrow. |
Please, take a look at: https://cloud-native.slack.com/archives/C039JSH0807/p1695618313572309?thread_ts=1695591000.697989&cid=C039JSH0807 |
Let me try to reply to my own question here after a convo that we had on Slack: https://cloud-native.slack.com/archives/C039JSH0807/p1696405223939269
"correct" seems to be the answer.
"correct" seems to be the answer. |
You could make the same attack with a properly configured containerd. |
Okay, this in interessting. |
Modifying the shim would probably produce very similar results. in fact that is the most direct way to think about the issue. A host doesn't even have to use containerd or the shim, though. In theory they could start the VM manually, and send requests to the kata agent manually regardless of how the host was configured. |
Is that something that could be crafted w/ a QEMU cmdline + some script to make it easier to reproduce the issue? |
My hunch is that it will be easiest to reproduce with a fork of the shim or with crictl. I am working on an example. |
I have a reproducible example of the containerd caching thing that I mention above. I will probably split this into a different issue once I understand it better, but let me leave this here for now since it is very interesting. If I try pulling an encrypted image directly via containerd with crictl (after deleting any existing images on the host), it doesn't work, which is what we would expect.
Pulling the unencrypted image works fine.
Now if I go back and pull the encrypted image again (after pulling the unencrypted one), the image pulls successfully.
What is going on here? It seems like maybe there is some layer caching mechanism, but the image manifests don't have the same hashes for any of the layers. The rootfs of both containers is the same, however. |
Ok, the encrypted and unencrypted containers share the same top layer.
If you look in the rootfs section, you can see that there is a hash for each layer. These hashes are the same for the encrypted and unencrypted image. This is probably what containerd is using to cache images. This seems a little bit scary at first, because it allows containerd to unpack an encrypted image without the key, but that will only work if the unencryptedd image is already present on the host. While this is technically an information leak about the unencrypted image that could be leveraged in brute force attacks, it probably isn't something we should worry about. That said, we should update our tests to make sure we don't use identical encrypted and unencrypted images because that can create false positives. Btw note that basically everything specified in the dockerfile gets added to this layer, which is not protected or validated. (Dw that private key is supposed to be there for the demo) |
Please, take a look at: https://cloud-native.slack.com/archives/C039JSH0807/p1695618313572309?thread_ts=1695591000.697989&cid=C039JSH0807 (2nd try). IOW, we can update the tests, of course; or we can simply play with the order of the tests, or we can simply delete the images in between the different runs. This is well known info, no news here. |
Which images are you comparing specifically? using your above examples there don't seem to be common hashes: |
I've also added
to the issue I've created for merging encrypted image support to main (kata-containers/kata-containers#8111) so that we do this better in the next round. |
Has anyone looked into using the Integrity Measurement Architecture (IAM) and the Extended Verification Module (EAM) to measure and appraise the host? https://wiki.gentoo.org/wiki/Integrity_Measurement_Architecture |
I am looking at |
I have thought about this some. My general feeling is that we should stick to our current architecture where the host is deliberately not trusted. I think this is a feature of the project rather than a shortcoming. When we start to think about measuring the host via a TPM, it opens a huge can of worms. For instance it's not really clear how the client should validate these extra measurements, particularly given that they don't operate the host. |
I do not see how this changes the architecture if we reduce the attack vector; the host is still deliberately untrusted. You do not have to use the TPM at all. Can you elaborate on a huge can of worms? |
Maybe I'm not following exactly what you are proposing, but I am assuming that if you want to use something like IMA to validate the host, you will want that to be build on some kind of root of trust. How would you do something like that without a TPM?
Generally the client is the person who brings the workload and validates the measurement. There is a little bit of ambiguity between a workload owner and a data provider, but basically we are talking about the guest and whoever it is operating on behalf of.
No. The hardware evidence provided to the guest guarantees that the PodVM is in a valid state. If a host does not provide a valid PodVM, it is a denial of service attack. Prevention of DoS is a non-goal of the project (because it is a non-goal of confidential computing). There are critical components on the host, but they are untrusted.
The CoCo operator installs some things on the host, changes some configurations, and adds runtime classes to Kubernetes. Other than that, the project is mainly agnostic to what is running on the host. It would be a significant shift to begin mandating a broader host configuration. In theory we would only need to validate the pieces installed by the operator, but realistically we would need to measure more than that to get firm guarantees. When it comes to the PodVM, it's relatively easy for the project to define what it should look like and for the client to know what to expect from the measurement. When it comes to the host, there is a huge variety of valid configurations. One particular issue is that some deployments will rely on proprietary host components, such as proprietary hypervisors. Validating that as a guest owner would be very difficult. Another thing that simplifies measuring the PodVM is that it is static. When we start thinking about measuring the host, we run into questions about updates. What happens if the host is updated while a pod is running? Would we have to re-attest it? However you slice it our attestation mechanism would be getting a lot more complicated. These are just the first questions that come to mind, but I think the main issue is higher level. I see host measurements and confidential computing as fundamentally different approaches. That said, I think it would actually make sense in some cases to use both. Host measurement could be an additional level of security. This should really be setup by the CSP, however, or whoever owns the nodes. I think it is way out of scope for our project. Keep in mind that we'd have to figure it out for peer pods and enclave-cc as well. |
Btw, I have created simple reproducible example of an attack that takes advantage of unvalidated mounts. It is a little bit more circuitous than what I originally described, but hopefully I will be able to refine it a bit in the next few days. I am waiting to share the code, but if anyone is particularly interested lmk and we can discuss. We should probably start formalizing a disclosure process. |
This is a serious issue involving many different components, so I have created it in our general repo.
By using Confidential Containers with upstream containerd (or with the containerd fork without the appropriate
cri_handler
enabled) and without the Nydus snapshotter configured, a malicious host can trick Kata into pulling an encrypted container image on the host and mounting it into the guest where it will be decrypted using the Attestation Agent.After the container has been unpacked, it will still be visible on the host. This reveals any sensitive information that was in the encrypted image. The host will also be able to write to the rootfs of the container, which can influence workload control flow. This malicious configuration can be enabled by the host without the guest/client being aware of it.
This attack affects most runtime classes (not peer pods or enclave-cc). Along with the upcoming v0.8.0, at least one previous release (v0.7.0) is vulnerable.
It is easy to reproduce the problem. Note that these are also the steps a malicious host would follow to steal secrets from a guest.
/etc/systemd/system/containerd.service.d/containerd-for-cc-override.conf
(andsudo systemctl daemon-reload
andsudo systemctl restart containerd
)./run/kata-containers/shared/sandboxes/<sandbox-id>/mounts/<cid>/rootfs
.ssh/authorized_keys
You can also use the debug console to inspect the guest from the inside. If you run
cat /proc/mounts
from inside the guest, you'll see the container rootfs being mounted from the host.A similar attack can be used against signed containers (without dm-verity). Since the host can write to the rootfs in the guest, the host can modify the container image after the signature has been validated.
There is also a similar issue affecting the
pause
container. The pause container is pulled inside the guest, but even when we use the containerd fork, the rootfs of the pause container is exposed to the host. The host could overwrite thepause
binary to execute arbitrary code inside the guest (albeit inside a container).One of the fundamental issues here is that we do no validation of what is mounted inside the guest. One possible solution would be to use the new agent api restriction policies to block the agent from mounting harmful directories. That said, I am not sure we want to rely on the user to close an enormous hole in the trust model. I am not yet sure what the best way to fix this issue will be. I don't entirely understand how we are able to pull a container image on the host (which I assume we must be doing) but still unpack/decrypt it in the guest. This is another angle to consider.
We should perhaps use this issue as inspiration to develop a more formal security process so that future bugs of this magnitude can be handled via a standard procedure.
The text was updated successfully, but these errors were encountered: