Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encrypted image can be pulled on the host but decrypted in the guest, revealing secrets #162

Open
fitzthum opened this issue Oct 2, 2023 · 31 comments

Comments

@fitzthum
Copy link
Member

fitzthum commented Oct 2, 2023

This is a serious issue involving many different components, so I have created it in our general repo.

By using Confidential Containers with upstream containerd (or with the containerd fork without the appropriate cri_handler enabled) and without the Nydus snapshotter configured, a malicious host can trick Kata into pulling an encrypted container image on the host and mounting it into the guest where it will be decrypted using the Attestation Agent.

After the container has been unpacked, it will still be visible on the host. This reveals any sensitive information that was in the encrypted image. The host will also be able to write to the rootfs of the container, which can influence workload control flow. This malicious configuration can be enabled by the host without the guest/client being aware of it.

This attack affects most runtime classes (not peer pods or enclave-cc). Along with the upcoming v0.8.0, at least one previous release (v0.7.0) is vulnerable.

It is easy to reproduce the problem. Note that these are also the steps a malicious host would follow to steal secrets from a guest.

  1. Install Confidential Containers via the operator.
  2. Bypass the containerd fork by deleting /etc/systemd/system/containerd.service.d/containerd-for-cc-override.conf (and sudo systemctl daemon-reload and sudo systemctl restart containerd).
  3. Start an encrypted or unencrypted container (if you start an encrypted container remember, to start a KBS).
  4. Inspect /run/kata-containers/shared/sandboxes/<sandbox-id>/mounts/<cid>/rootfs
    • If you're using one of our demo images try looking in .ssh/authorized_keys

You can also use the debug console to inspect the guest from the inside. If you run cat /proc/mounts from inside the guest, you'll see the container rootfs being mounted from the host.

A similar attack can be used against signed containers (without dm-verity). Since the host can write to the rootfs in the guest, the host can modify the container image after the signature has been validated.

There is also a similar issue affecting the pause container. The pause container is pulled inside the guest, but even when we use the containerd fork, the rootfs of the pause container is exposed to the host. The host could overwrite the pause binary to execute arbitrary code inside the guest (albeit inside a container).

One of the fundamental issues here is that we do no validation of what is mounted inside the guest. One possible solution would be to use the new agent api restriction policies to block the agent from mounting harmful directories. That said, I am not sure we want to rely on the user to close an enormous hole in the trust model. I am not yet sure what the best way to fix this issue will be. I don't entirely understand how we are able to pull a container image on the host (which I assume we must be doing) but still unpack/decrypt it in the guest. This is another angle to consider.

We should perhaps use this issue as inspiration to develop a more formal security process so that future bugs of this magnitude can be handled via a standard procedure.

@magowan
Copy link
Member

magowan commented Oct 2, 2023

I don't think policy by itself solves the problem though it would be a way to express what is expected by way of mounts.
We need to identify that the location to be used to unpack the container images is under the control of the guest.
Meaning we are able to assert this for all scenarios (TEEs for RAM disk, block devices etc) and that it's encryption is controlled by the guest/TEE).
Is there an easy common way to determine this?

We certainly need a test for scenario you have uncovered and a lot more tests for scenarios where we manipulate the way the host interacts with the guest and tries to change what resources are provided to the guest.

@fidencio
Copy link
Member

fidencio commented Oct 2, 2023

@fitzthum, please, do not take my comment as "this is not an issue", that's not the case but I must add the following sentence for the sake of understanding the problem, and ir order to try to narrow down the problem, and understand what would be a good mitigation plan.

Using the non-forked containerd with the Confidential Containers, as of right now, is not a supported scenario.

With that said, I'd be super interested if we can reproduce the issue with the not-yet-released "pulling on the guest" approach.

One thing that I'd like to ask you to provie, if possible, is the QEMU command-line used to start the pod sandbox, as it sounds like we may be doing something wrong there. I'm mostly interested to understand a few things:

  1. What we're doing wrong from the Kata Containers side
  2. Whether this affects the pull on the guest approach using a snapshotter
  3. How to add tests to cover this
  4. How to create a well specified supported scenarios for what we're doing, and from that figure out ways to block such attacks from different projects we're developing / relying on.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 2, 2023

Yes, a host that uses this attack would technically be using CoCo in an unsupported way. We currently have no way to prevent them from doing so, which is what makes this dangerous. This also hints at why our tests did not catch this. We haven't implemented any tests that cover improper installations. We do have a test that makes sure the image is not pulled on the host, but we've only run it with the forked containerd. In the future we should probably think more about doing more "adversarial" testing on the APIs that are exposed to untrusted components. No matter how the host is configured, we can't expose guest secrets to it.

I phrased this issue in terms of the forked containerd, which suggests that it is more of a problem with the old approach, but in fact I discovered it when testing the latest Kata CI bundle. You will run into the problems described here with the new approach with upstream containerd if you simply don't setup the snapshotter, which happens to be exactly what the operator does right now since we haven't merged the nydus support to the operator yet. Even once that is merged the host can simply disable the snapshotter without the guest noticing.

I have not tested with the snapshotter enabled. When we use nydus with integrity protection, the rootfs should be covered by dm-verity. In theory that won't be affected. I'm not sure what will happen when we use the snapshotter to pull inside the guest. Like I said, if the host disables the snapshotter we will have a problem.

Here is the QEMU command

/opt/kata/bin/qemu-system-x86_64 -name sandbox-011b3a11924491f942813b122200c085b2bdaa1c9458d4d8c8d49cff90878198 -uuid 44d9ef69-3c9b-412f-a8a0-664f98344d37 -machine q35,accel=kvm,kernel_irqchip=split,confidential-guest-support=sev -cpu host,pmu=off -qmp unix:fd=3,server=on,wait=off -monitor unix:path=/run/vc/vm/011b3a11924491f942813b122200c085b2bdaa1c9458d4d8c8d49cff90878198/hmp.sock,server=on,wait=off -m 2048M,slots=10,maxmem=257720M -device pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=off,addr=2,io-reserve=4k,mem-reserve=1m,pref64-reserve=1m -device virtio-serial-pci,disable-modern=false,id=serial0 -device virtconsole,chardev=charconsole0,id=console0 -chardev socket,id=charconsole0,path=/run/vc/vm/011b3a11924491f942813b122200c085b2bdaa1c9458d4d8c8d49cff90878198/console.sock,server=on,wait=off -device virtio-scsi-pci,id=scsi0,disable-modern=false -object sev-guest,id=sev,cbitpos=51,reduced-phys-bits=5,policy=3,kernel-hashes=on -drive if=pflash,format=raw,readonly=on,file=/opt/kata/share/ovmf/AMDSEV.fd -object rng-random,id=rng0,filename=/dev/urandom -device virtio-rng-pci,rng=rng0 -device vhost-vsock-pci,disable-modern=false,vhostfd=4,id=vsock-3368031304,guest-cid=3368031304 -device virtio-9p-pci,disable-modern=false,fsdev=extra-9p-kataShared,mount_tag=kataShared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/011b3a11924491f942813b122200c085b2bdaa1c9458d4d8c8d49cff90878198/shared,security_model=none,multidevs=remap -netdev tap,id=network-0,vhost=on,vhostfds=5,fds=6 -device driver=virtio-net-pci,netdev=network-0,mac=a2:a8:d5:70:dd:d0,disable-modern=false,mq=on,vectors=4 -rtc base=utc,driftfix=slew,clock=host -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults -nographic --no-reboot -object memory-backend-ram,id=dimm1,size=2048M -numa node,memdev=dimm1 -kernel /opt/kata/share/kata-containers/vmlinuz-5.19.2-114-sev -initrd /opt/kata/share/kata-containers/kata-ubuntu-20.04-sev.initrd -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k cryptomgr.notests net.ifnames=0 pci=lastbus=0 console=hvc0 console=hvc1 debug panic=1 nr_cpus=1 selinux=0 scsi_mod.scan=none agent.log=debug agent.debug_console agent.debug_console_vport=1026 agent.config_file=/etc/agent-config.toml agent.enable_signature_verification=false -pidfile /run/vc/vm/011b3a11924491f942813b122200c085b2bdaa1c9458d4d8c8d49cff90878198/pid -smp 1,cores=1,threads=1,sockets=1,maxcpus=1

I think the significant part is probably

-device virtio-9p-pci,disable-modern=false,fsdev=extra-9p-kataShared,mount_tag=kataShared -fsdev local,id=extra-9p-kataShared,path=/run/kata-containers/shared/sandboxes/011b3a11924491f942813b122200c085b2bdaa1c9458d4d8c8d49cff90878198/shared,security_model=none,multidevs=remap 

Since the QEMU command line is not measured and the agent APIs that result in mounting things are not measured, we have very little control of what ends up mounted in the guest.

@fidencio
Copy link
Member

fidencio commented Oct 2, 2023

Yeah, setting shared_fs = none would be a reasonable way to mitigate that, as that would lead to using exactly the same mechanism that's used with peer-pods, which (AFAIU from the first comment) is not affected.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 2, 2023

Yeah, setting shared_fs = none would be a reasonable way to mitigate that, as that would lead to using exactly the same mechanism that's used with peer-pods, which (AFAIU from the first comment) is not affected.

Wouldn't this break the host-pulling stuff? I don't think setting the shared_fs arg in the Kata config will actually change the measurement anyway, so the host could just turn sharing back on.

I haven't tested with peer pods, but I am assuming that it wouldn't be affected because the host can't really mount things to the guest regardless of configuration. That said, it might actually be possible to execute a similar attack over the network.

I would like to totally remove the shared fs from CoCo, but I had thought there were still some requirements, such as some of the network configs. I guess peer pods must have a workaround for this or maybe I am misremembering. Either way I think it would be good to cut down on sharing. That said, there might be some legitimate uses of sharing, such as the host-pulling or some potential confidential storage implementations.

@fidencio
Copy link
Member

fidencio commented Oct 2, 2023

Wouldn't this break the host-pulling stuff?

Not sure.

I don't think setting the shared_fs arg in the Kata config will actually change the measurement anyway, so the host could just turn sharing back on.

The operator should be responsible for always reconciling the configuration file. However, I can clearly see an issue of someone adding, for instance, /etc/kata-containers/configuration.toml file which would take precedence over the /opt/kata/... stuff.

Regardless of the QEMU command line, can we have the Kata Containers / containerd configuration files to be validated as part of the measurement?

@fitzthum
Copy link
Member Author

fitzthum commented Oct 2, 2023

Regardless of the QEMU command line, can we have the Kata Containers / containerd configuration files to be validated as part of the measurement?

I don't think we really have hardware support for this. Our best bet will probably be validating things from inside the guest, with the assumption that a malicious host could override the operator and make the shim do absolutely anything.

@fidencio
Copy link
Member

fidencio commented Oct 3, 2023

So, I was having a convo with @stevenhorsman and we think there a few things that would be good to try.

  1. Trusted storage (which is listed as presented in the Sep 1st, 2022 meeting), which most likely should be enabled by default
  2. Having the kata-containers configuration as part of the policy
  3. Not sure if doable, but having the containerd configuration as part of the policy -- this would require a very good understanding of what's the provider containerd config

Most likely a combination of those 3.

@jepio
Copy link
Member

jepio commented Oct 3, 2023

I'm not entirely following what is happening here. If vanilla containerd is used, it will try to pull and unpack an encrypted image, which it can't handle. Even if vanilla containerd pulls the container image, surely kata-agent does not attempt to decrypt payloads coming from the host like this?

Going further - shared_fs is a non-CoCo thing and only incidentally enabled for CoCo. We need encrypted storage for encrypted container images.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 3, 2023

Even if vanilla containerd pulls the container image, surely kata-agent does not attempt to decrypt payloads coming from the host like this?

This seems to be exactly what is happening. It's kind of hard to believe. It might be good if someone replicated what I describe just to make sure I didn't dream the entire thing.

Keep in mind that even if we pull the image inside the guest, if the host can create shard directories that overlap with the rootfs, we have a problem. The behavior of the pause image might be an example of this. I still need to figure out exactly how we can pull on the host but unpack on the guest. I suspect that those actions are simply driven by different agent endpoints.

I don't think measuring stuff that lives on the host is going to be feasible. I have a few vague ideas of solutions but none of them are very optimal.

  1. Add a check in image-rs to make sure it unpacks images to directories that are not shared. In theory this is pretty simple to implement, but it's a bit hacky, non-generic, and I think it might not actually work for every case (can mount be created after image is unpacked, for instance).
  2. Use policy to validate all mounts. This might not require any changes, although it would only work in main, and like I mentioned it would put a significant burden on the user.
  3. Make pull/unpack atomic. We could modify the agent or image-rs to make sure that pulling and unpacking always happen together, either by changing the API or adding state. That said, this wouldn't prevent creation of dangerous mounts and it might be hard to reconcile with standard Kata.

@dbuono
Copy link

dbuono commented Oct 3, 2023

I don't think a change to the kata config file can help here, simply because the config file is not measured.
Even if it was, we have some use-case where we mount persistent volumes in the containers. Those mounts are done on the host and passed to the kata vm through the shared fs. The problem is not enabling shared filesystem per se - even if the container image is encrypted, it is very reasonable to share other data (possibly encrypted if necessary) using a PV.

The problem to me seems much more general and probably needs to be addressed differently:

The host, including the kata-shim, is not trusted. Therefore we cannot implicitly trust that the mountpoints received from the hosts are the one the client expected/defined in the yaml.

Encrypted rootfs is an example, but a malicious shim could most likely, for example, convert a mountpoint that was supposed to be a tmpfs inside the VM with a shared folder, and this way access data that was not supposed to go outside of the VM. There are probably different types of attack that could be generated by creating additional mountpoints, even without a shared filesystem.

@fidencio
Copy link
Member

fidencio commented Oct 3, 2023

So, let's take a step back, and please, bear with me.

All the possible attacks have one specifc root cause: a "misconfiguration" on the host OS. Is this affirmation correct or incorrect?

If this information is correct ...
This is a misconfiguration that comes from containerd. Correct or incorrect?

If this information is incorrect ...
What's the way to reproduce this in a "non misconfigured" host OS?

I will ask more questions after I get the answers for those.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 3, 2023

Ok, I have unraveled the mystery a bit.

Like @jepio I didn't think it was very plausible that the image could be downloaded on the host but decrypted inside of the guest. It turns out it isn't. The image pulling and unpacking are atomic as they should be. This isn't to say that we don't have a problem here. We still have a huge issue with unvalidated filesystem sharing. To exploit this attack vector, though, will require a malicious containerd (or shim), rather than simply installing the upstream one. I hope to provide an example of this soon.

So why were multiple people able to consistently run an encrypted image that was pulled on the host? It turns out that Kubernetes was doing some very sneaky caching. The encrypted and unencrypted test images have the same content. It turns out that in the CI and in local manual testing, if the unencrypted image had already been downloaded on the host, then the encrypted image would not be downloaded (even though containerd would say that it was downloaded and even though the image pull policy was set to always). I'm not yet sure exactly where this caching is happening. Hopefully I can confirm tomorrow. This does reveal a subtle issue with our CI. We should probably update these images so that they don't have identical rootfs otherwise we could pick up false positives.

So in short, this is an unvalidated sharing issue rather than an image pulling issue. The biggest thing to check is whether a malicious shim can ask the agent to create a shared mount on top of the rootfs when the container is pulled in the guest. I will try to check this directly tomorrow.

@fidencio
Copy link
Member

fidencio commented Oct 4, 2023

So why were multiple people able to consistently run an encrypted image that was pulled on the host? It turns out that Kubernetes was doing some very sneaky caching.

Please, take a look at: https://cloud-native.slack.com/archives/C039JSH0807/p1695618313572309?thread_ts=1695591000.697989&cid=C039JSH0807

@fidencio
Copy link
Member

fidencio commented Oct 4, 2023

Let me try to reply to my own question here after a convo that we had on Slack: https://cloud-native.slack.com/archives/C039JSH0807/p1696405223939269

So, let's take a step back, and please, bear with me.

All the possible attacks have one specifc root cause: a "misconfiguration" on the host OS. Is this affirmation correct or incorrect?

"correct" seems to be the answer.

If this information is correct ... This is a misconfiguration that comes from containerd. Correct or incorrect?

"correct" seems to be the answer.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 4, 2023

"correct" seems to be the answer.

You could make the same attack with a properly configured containerd.

@fidencio
Copy link
Member

fidencio commented Oct 4, 2023

You could make the same attack with a properly configured containerd.

Okay, this in interessting.
@fitzthum, please, mind to expand on that?

@fitzthum
Copy link
Member Author

fitzthum commented Oct 4, 2023

Okay, this in interessting.
@fitzthum, please, mind to expand on that?

Modifying the shim would probably produce very similar results. in fact that is the most direct way to think about the issue. A host doesn't even have to use containerd or the shim, though. In theory they could start the VM manually, and send requests to the kata agent manually regardless of how the host was configured.

@mkulke
Copy link

mkulke commented Oct 4, 2023

In theory they could start the VM manually, and send requests to the kata agent manually regardless of how the host was configured.

Is that something that could be crafted w/ a QEMU cmdline + some script to make it easier to reproduce the issue?

@fitzthum
Copy link
Member Author

fitzthum commented Oct 4, 2023

Is that something that could be crafted w/ a QEMU cmdline + some script to make it easier to reproduce the issue?

My hunch is that it will be easiest to reproduce with a fork of the shim or with crictl. I am working on an example.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 4, 2023

I have a reproducible example of the containerd caching thing that I mention above. I will probably split this into a different issue once I understand it better, but let me leave this here for now since it is very interesting.

If I try pulling an encrypted image directly via containerd with crictl (after deleting any existing images on the host), it doesn't work, which is what we would expect.

DEBU[0000] get image connection                         
DEBU[0000] PullImageRequest: &PullImageRequest{Image:&ImageSpec{Image:ghcr.io/confidential-containers/test-container:encrypted,Annotations:map[string]string{},},Auth:nil,SandboxConfig:nil,} 
E1004 12:37:41.216975  701858 remote_image.go:238] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"ghcr.io/confidential-containers/test-container:encrypted\": failed to extract layer sha256:9733ccc395133a067f01ee6e380003d80fe9f443673e0f992ae6a4a7860a872c: failed to get stream processor for application/vnd.oci.image.layer.v1.tar+gzip+encrypted: ctd-decoder resolves to executable in current directory (./ctd-decoder): unknown" image="ghcr.io/confidential-containers/test-container:encrypted"
FATA[0001] pulling image: rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/confidential-containers/test-container:encrypted": failed to extract layer sha256:9733ccc395133a067f01ee6e380003d80fe9f443673e0f992ae6a4a7860a872c: failed to get stream processor for application/vnd.oci.image.layer.v1.tar+gzip+encrypted: ctd-decoder resolves to executable in current directory (./ctd-decoder): unknown

Pulling the unencrypted image works fine.

tobin@slmilan04:~/pods$ sudo crictl -D -r unix:///run/containerd/containerd.sock pull ghcr.io/confidential-containers/test-container:unencrypted
DEBU[0000] get image connection                         
DEBU[0000] PullImageRequest: &PullImageRequest{Image:&ImageSpec{Image:ghcr.io/confidential-containers/test-container:unencrypted,Annotations:map[string]string{},},Auth:nil,SandboxConfig:nil,} 
DEBU[0001] PullImageResponse: &PullImageResponse{ImageRef:sha256:3bf7ec31ad5b7744d08fe885a7b7306bb4ce6902d031d7b03a39a58d03b18a66,} 
Image is up to date for sha256:3bf7ec31ad5b7744d08fe885a7b7306bb4ce6902d031d7b03a39a58d03b18a66

Now if I go back and pull the encrypted image again (after pulling the unencrypted one), the image pulls successfully.

tobin@slmilan04:~/pods$ sudo crictl -D -r unix:///run/containerd/containerd.sock pull ghcr.io/confidential-containers/test-container:encrypted
DEBU[0000] get image connection                         
DEBU[0000] PullImageRequest: &PullImageRequest{Image:&ImageSpec{Image:ghcr.io/confidential-containers/test-container:encrypted,Annotations:map[string]string{},},Auth:nil,SandboxConfig:nil,} 
DEBU[0001] PullImageResponse: &PullImageResponse{ImageRef:sha256:b3d172cd5c0f17e02bf4e1f8c4c712794ed4c008ad676799c1674092e23e6409,} 
Image is up to date for sha256:b3d172cd5c0f17e02bf4e1f8c4c712794ed4c008ad676799c1674092e23e6409

What is going on here? It seems like maybe there is some layer caching mechanism, but the image manifests don't have the same hashes for any of the layers. The rootfs of both containers is the same, however.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 4, 2023

Ok, the encrypted and unencrypted containers share the same top layer.

{
  "created": "2023-03-31T16:44:49.685353442+08:00",
  "architecture": "amd64",
  "os": "linux",
  "config": {
    "Env": [
      "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
    ],
    "Entrypoint": [
      "/bin/sh",
      "-c",
      "/usr/sbin/sshd -D"
    ],
    "Labels": {
      "enc_key": "HUlOu8NWz8si11OZUzUJMnjiq/iZyHBJZMSD3BaqgMc=",
      "ssh_key": "-----BEGIN OPENSSH PRIVATE KEY-----b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAMwAAAAtzc2gtZWQyNTUxOQAAACAfiGV2X4o+6AgjVBaY/ZR2UvZp84dVYF5bpNZGMLylQwAAAIhawtHJWsLRyQAAAAtzc2gtZWQyNTUxOQAAACAfiGV2X4o+6AgjVBaY/ZR2UvZp84dVYF5bpNZGMLylQwAAAEAwWYIBvBxQZgk0irFku3Lj1Xbfb8dHtVM/kkz/Uz/l2h+IZXZfij7oCCNUFpj9lHZS9mnzh1VgXluk1kYwvKVDAAAAAAECAwQF-----END OPENSSH PRIVATE KEY-----"
    }
  },
  "rootfs": {
    "type": "layers",
    "diff_ids": [
      "sha256:9733ccc395133a067f01ee6e380003d80fe9f443673e0f992ae6a4a7860a872c",
      "sha256:eb0d4d347a335d2aca52dc62c411c5992784be7955f6d41c9d55d3e29ebb0982",
      "sha256:359ac3183bf68fc383ad79173f9802f8c81931a23ddd626d0aeb1e94510c0143",
      "sha256:e5781361ff4875fc5bafe4921d8949f1aeffc220b3a97623f80eb3a515cf606c",
      "sha256:74e800daa69bccd7009a794a3f9c50f845de940142f2e6d0b880ec9060328f1f"
    ]
  },
  "history": [
    {
      "created": "2023-03-29T18:19:37.625607335Z",
      "created_by": "/bin/sh -c #(nop) ADD file:9663235f252e072c52b0f9e25845841e4321cce2caa7467a0d736c6003b05c00 in / "
    },
    {
      "created": "2023-03-29T18:19:37.727267748Z",
      "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/sh\"]",
      "empty_layer": true
    },
    {
      "created": "2023-03-31T16:27:54.026826946+08:00",
      "created_by": "RUN /bin/sh -c apk update && apk upgrade && apk add openssh-server # buildkit",
      "comment": "buildkit.dockerfile.v0"
    },
    {
      "created": "2023-03-31T16:27:54.409712196+08:00",
      "created_by": "RUN /bin/sh -c ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key -P \"\" # buildkit",
      "comment": "buildkit.dockerfile.v0"
    },
    {
      "created": "2023-03-31T16:27:54.794528568+08:00",
      "created_by": "RUN /bin/sh -c passwd -d root # buildkit",
      "comment": "buildkit.dockerfile.v0"
    },
    {
      "created": "2023-03-31T16:44:49.685353442+08:00",
      "created_by": "LABEL enc_key=HUlOu8NWz8si11OZUzUJMnjiq/iZyHBJZMSD3BaqgMc=",
      "comment": "buildkit.dockerfile.v0",
      "empty_layer": true
    },
    {
      "created": "2023-03-31T16:44:49.685353442+08:00",
      "created_by": "LABEL ssh_key=-----BEGIN OPENSSH PRIVATE KEY-----b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAMwAAAAtzc2gtZWQyNTUxOQAAACAfiGV2X4o+6AgjVBaY/ZR2UvZp84dVYF5bpNZGMLylQwAAAIhawtHJWsLRyQAAAAtzc2gtZWQyNTUxOQAAACAfiGV2X4o+6AgjVBaY/ZR2UvZp84dVYF5bpNZGMLylQwAAAEAwWYIBvBxQZgk0irFku3Lj1Xbfb8dHtVM/kkz/Uz/l2h+IZXZfij7oCCNUFpj9lHZS9mnzh1VgXluk1kYwvKVDAAAAAAECAwQF-----END OPENSSH PRIVATE KEY-----",
      "comment": "buildkit.dockerfile.v0",
      "empty_layer": true
    },
    {
      "created": "2023-03-31T16:44:49.685353442+08:00",
      "created_by": "COPY ccv0-ssh.pub /root/.ssh/authorized_keys # buildkit",
      "comment": "buildkit.dockerfile.v0"
    },
    {
      "created": "2023-03-31T16:44:49.685353442+08:00",
      "created_by": "ENTRYPOINT [\"/bin/sh\" \"-c\" \"/usr/sbin/sshd -D\"]",
      "comment": "buildkit.dockerfile.v0",
      "empty_layer": true
    }
  ]
}

If you look in the rootfs section, you can see that there is a hash for each layer. These hashes are the same for the encrypted and unencrypted image. This is probably what containerd is using to cache images.

This seems a little bit scary at first, because it allows containerd to unpack an encrypted image without the key, but that will only work if the unencryptedd image is already present on the host. While this is technically an information leak about the unencrypted image that could be leveraged in brute force attacks, it probably isn't something we should worry about.

That said, we should update our tests to make sure we don't use identical encrypted and unencrypted images because that can create false positives.

Btw note that basically everything specified in the dockerfile gets added to this layer, which is not protected or validated. (Dw that private key is supposed to be there for the demo)

@fidencio
Copy link
Member

fidencio commented Oct 5, 2023

That said, we should update our tests to make sure we don't use identical encrypted and unencrypted images because that can create false positives.

Please, take a look at: https://cloud-native.slack.com/archives/C039JSH0807/p1695618313572309?thread_ts=1695591000.697989&cid=C039JSH0807 (2nd try).

IOW, we can update the tests, of course; or we can simply play with the order of the tests, or we can simply delete the images in between the different runs. This is well known info, no news here.

@mkulke
Copy link

mkulke commented Oct 5, 2023

If you look in the rootfs section, you can see that there is a hash for each layer. These hashes are the same for the encrypted and unencrypted image. This is probably what containerd is using to cache images.

Which images are you comparing specifically? using your above examples there don't seem to be common hashes:

image

@stevenhorsman
Copy link
Member

IOW, we can update the tests, of course; or we can simply play with the order of the tests, or we can simply delete the images in between the different runs. This is well known info, no news here.

I've also added

One thing worth noting with this is that to avoid sha caching/matching, we want to ensure that the content of our encrypted image is slightly different to the unencrypted one.

to the issue I've created for merging encrypted image support to main (kata-containers/kata-containers#8111) so that we do this better in the next round.

@zvonkok
Copy link
Member

zvonkok commented Oct 5, 2023

Has anyone looked into using the Integrity Measurement Architecture (IAM) and the Extended Verification Module (EAM) to measure and appraise the host?
We have a gap after secure-boot and before CoCo attestation where the host essentially can do "anything" to attack the CoCo stack.
With IAM/EVM (can be used with a TPM as well) we could lock configurations, protect binaries from being replaced and minimize other attack-vectors from the host.
We should also think about how we're going to attest higher level SW components like the CoCo operator, a malicious operator could add mutating webhooks to mangle the API before it reaches the API server.

https://wiki.gentoo.org/wiki/Integrity_Measurement_Architecture
https://wiki.gentoo.org/wiki/Extended_Verification_Module

@fitzthum
Copy link
Member Author

fitzthum commented Oct 5, 2023

Which images are you comparing specifically? using your above examples there don't seem to be common hashes:

I am looking at ghcr.io/confidential-containers/test-container and comparing the unencrypted and the encrypted or multi-arch-encrypted tags. When you look at skopeo inspect, the hashes are different. I did a skopeo copy of the image into a local registry and then looked at the file for the top layer of each image. That is what I posted above. There the hashes are the same. I'm not an oci wizard so I'm not totally sure what the difference is. Maybe one is hash of the binary blob of the layer and the other is the hash of the layer itself (not sure that distinction even makes sense)?

@fitzthum
Copy link
Member Author

fitzthum commented Oct 5, 2023

Has anyone looked into using the Integrity Measurement Architecture (IAM) and the Extended Verification Module (EAM) to measure and appraise the host?

I have thought about this some. My general feeling is that we should stick to our current architecture where the host is deliberately not trusted. I think this is a feature of the project rather than a shortcoming. When we start to think about measuring the host via a TPM, it opens a huge can of worms. For instance it's not really clear how the client should validate these extra measurements, particularly given that they don't operate the host.

@zvonkok
Copy link
Member

zvonkok commented Oct 5, 2023

I do not see how this changes the architecture if we reduce the attack vector; the host is still deliberately untrusted. You do not have to use the TPM at all. Can you elaborate on a huge can of worms?
Who/what is the client in your mental model?
We're deploying critical components of the stack on the host even if it is deliberately not trusted, aren't we trusting the host to some extent to do the right thing?
The CoCo stack e.g. could provide the policy for how to enforce IAM/EVM.

@fitzthum
Copy link
Member Author

fitzthum commented Oct 5, 2023

I do not see how this changes the architecture if we reduce the attack vector; the host is still deliberately untrusted. You do not have to use the TPM at all.

Maybe I'm not following exactly what you are proposing, but I am assuming that if you want to use something like IMA to validate the host, you will want that to be build on some kind of root of trust. How would you do something like that without a TPM?

Who/what is the client in your mental model?

Generally the client is the person who brings the workload and validates the measurement. There is a little bit of ambiguity between a workload owner and a data provider, but basically we are talking about the guest and whoever it is operating on behalf of.

We're deploying critical components of the stack on the host even if it is deliberately not trusted, aren't we trusting the host to some extent to do the right thing?

No. The hardware evidence provided to the guest guarantees that the PodVM is in a valid state. If a host does not provide a valid PodVM, it is a denial of service attack. Prevention of DoS is a non-goal of the project (because it is a non-goal of confidential computing). There are critical components on the host, but they are untrusted.

The CoCo stack e.g. could provide the policy for how to enforce IAM/EVM.

The CoCo operator installs some things on the host, changes some configurations, and adds runtime classes to Kubernetes. Other than that, the project is mainly agnostic to what is running on the host. It would be a significant shift to begin mandating a broader host configuration. In theory we would only need to validate the pieces installed by the operator, but realistically we would need to measure more than that to get firm guarantees.

When it comes to the PodVM, it's relatively easy for the project to define what it should look like and for the client to know what to expect from the measurement. When it comes to the host, there is a huge variety of valid configurations. One particular issue is that some deployments will rely on proprietary host components, such as proprietary hypervisors. Validating that as a guest owner would be very difficult.

Another thing that simplifies measuring the PodVM is that it is static. When we start thinking about measuring the host, we run into questions about updates. What happens if the host is updated while a pod is running? Would we have to re-attest it? However you slice it our attestation mechanism would be getting a lot more complicated.

These are just the first questions that come to mind, but I think the main issue is higher level. I see host measurements and confidential computing as fundamentally different approaches. That said, I think it would actually make sense in some cases to use both. Host measurement could be an additional level of security. This should really be setup by the CSP, however, or whoever owns the nodes. I think it is way out of scope for our project. Keep in mind that we'd have to figure it out for peer pods and enclave-cc as well.

@jiangliu jiangliu closed this as completed Oct 9, 2023
@jiangliu jiangliu reopened this Oct 9, 2023
@fitzthum
Copy link
Member Author

Btw, I have created simple reproducible example of an attack that takes advantage of unvalidated mounts. It is a little bit more circuitous than what I originally described, but hopefully I will be able to refine it a bit in the next few days. I am waiting to share the code, but if anyone is particularly interested lmk and we can discuss. We should probably start formalizing a disclosure process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants