Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Attestation Agent Proposal #254

Closed
jiazhang0 opened this issue Jul 20, 2021 · 32 comments
Closed

[RFC] Attestation Agent Proposal #254

jiazhang0 opened this issue Jul 20, 2021 · 32 comments

Comments

@jiazhang0
Copy link
Member

jiazhang0 commented Jul 20, 2021

Edit

  • Add KBC module "EAA" (Enclave Attestation Architecture).
  • Use the term "KBC" instead of "KBS module/plugin" to indicate the component integrated to AA for communicating with platform specific KBS
  • Update collected suggestions from @hdxia and @fitzthum
  • Rename SEV local attestation to SEV pre-attestation
  • Add KBS modularization diagram

Summary

This proposal provides the implementation of attestation agent, targeting to facilitating a E2E attestation reference implementation for kata CC v0. This RFC reveals the details of encryption and decryption procedure, and introduces the design options to implement the attestation agent for kata CC v0.

Background

For kata cc v0 architecture, encryption and decryption operations are performed by different entities.

During image encryption, image creation tools, such as skopeo, buildah and ctr-enc, calls ocicrypt which essentially uses Layer Encryption Key (LEK) to encrypt the image layer. In the underlying implementation, ocicrypt generates LEK randomly, and serializes it and its encryption parameters into PrivateLayerBlockCipherOptions object, then encrypt the PrivateLayerBlockCipherOptions (PLBCO for short) object through calling the WrapKey API defined by key provider protocol. The details of the encryption process and the returned annotation packet containing the encrypted PrivateLayerBlockCipherOptions object are all determined by the implementation of Key Broker Service (KBS for short). Eventually, the annotation packet is stored in the image layer's annotations field (for example, the annotation ID can be org.opencontainers.image.enc.keys.provider.kata_cc_key_broker_foo).

In the process of image decryption, kata-agent calls ocicrypt-rs to retrieve the plain/decrypted PrivateLayerBlockCipherOptions object. In the underlying implementation, ocicrypt-rs calls the UnWrap API implemented by Attestation Agent (AA for short). AA needs to access KBS according to the parameters in annotation packet to decrypt the PrivateLayerBlockCipherOptions Object, and then return the decrypted PrivateLayerBlockCipherOptions Object to ocicrypt-rs as the return value of UnWrap API. Eventually, ocicrypt-rs decrypts the encrypted image layer using LEK from PrivateLayerBlockCipherOptions object.

The above workflow is especially suitable for remote attestation procedure that supports dynamical measurement such as Intel TDX.

remote

For the pre-attestation procedure of SEV/SEV-ES which only supports static measurement, AA only needs to access guest FW (or a kernel module driver) to get the plain PrivateLayerBlockCipherOptions object in the guest which is provisioned in the pre-attestation stage. This is so-called pre-attestation (or local attestation mentioned in P19 https://docs.google.com/presentation/d/1469nSRFtlHMSDDDWVLj0i21dR9M_3SO76ehQsyVSTUk/edit#slide=id.gdccf80c723_0_1261).

pre

From long term, AA is far more functional than just doing the decryption of PrivateLayerBlockCipherOptions object. For example, it can periodically report the TCB status to relying party. Therefore, it is necessary to implement AA as a long-live service. This proposal suggests to implement AA as gRPC endpoint, instead of a standalone binary program.

Goal

The AA is specially designed for kata cc architecture, so it's initial goal is to decrypt the PrivateLayerBlockCipherOptions object according to the input parameters defined by the key provider protocol. This is the only high level function that AA must implement in v0 architecture. This also means that AA does not need to implement the WrapKey API.

Further approaches includes:

  • Implement UnWrap API defined by key provider protocol
    Deserialize and parse the input parameters of UnWrap API, and serialize and return PrivateLayerBlockCipherOptions to ocicrypt-rs.

  • Support remote attestation and pre-attestation
    Specifically, certain HW-TEE requires to obtain PrivateLayerBlockCipherOptions through pre-attestation, and others requires to do it through remote attestation.

  • Abstract the precedure of PrivateLayerBlockCipherOptions decryption
    The precedure of PrivateLayerBlockCipherOptions decryption is implemented by KBS, and is also related to the access to attestation service.

Internal

Parse input parameter of UnWrapKey API

The format of input parameter of UnWrapKey API is KeyProviderKeyWrapProtocolInput.

{
  "op": "keyunwrap",
  "keyunwrapparams": {
    "dc": "$scheme[:$parameters]",
    "annotation": "$KBS_specific_annotation_packet"
  }
}

where:

  • $scheme and optional $parameters are specific to the implementation of the callee of ocicrypt-rs. In ocicrypt, $schemeis specified in key provider configuration file, and $extra_parameters is specified in command line of image creation tools such as ctr-enc, skopeo and buildah. See examples for the details. In kata CC, the configuration file may use the pattern "kata_cc_attestation:$mode" as preferred, where $mode is either local or remote, corresponding to pre-attestation and remote attestation.

  • $extra_parameters contains the KBS specific annotation packet. A good example from keyprovider test program shows the format of $KBS_specific_annotation_packet is generated during image encryption and stored in layer annotation. Its format is specific to the implementation of KBS.

Example:

{
  "op": "keyunwrap",
  "keyunwrapparams": {
    "dc": "kata_cc_attestation_agent:kbc=my_kbc",
    "annotation": "{ \"url\": \"https://$domain:port/api/getkey\", \"keyid\": \"foo\", \"payload\": \"encrypted_PLBCO\" }"
  }
}

Handle the return value of UnWrapKey API

No matter what method AA uses to obtain the plain/decrypted PrivateLayerBlockCipherOptions object, AA needs to serialize the PrivateLayerBlockCipherOptions object into the following JSON object and use it as the return value of UnWrappKey API.

{
  "keyunwrapresults": {
    "optsdata": "#plain_PLBCO"
  }
}

Abstract the procedure of PrivateLayerBlockCipherOptions decryption

Deem KBS as a service providing the capability of PrivateLayerBlockCipherOptions decryption. KBS can be abstracted as one of the following types (but not limited to):

  • Relying party model with remote attestation
  • Guest FW model with pre-attestation
  • Others: cloud HSM ...

There are two options to implement this abstraction.

Option 1:KBC (Key Broker Client) modularization

AA

KBS is platform specific implementation, so AA needs to define and implement a modularization framework to allow platform providers to communicate with their own KBS infrastructure through a corresponding KBC integrated to AA.

In this scheme, each KBC module needs to realize the following functions:

  • Function 1: implement a platform specific client for KBS.
    AA doesn't need to care about the detail of communication protocol between KBS and KBC. The KBC selection can be done in this way:

    {
      "op": "keyunwrap",
      "keyunwrapparams": {
        "dc": "kata_cc_attestation_agent:kbc=my_kbc",
        "annotation": "{ \"url\": \"https://$domain:port/api/getkey\", \"keyid\": \"foo\", \"payload\": \"encrypted_PLBCO\" }"
      }
    }
  • Function 2: define and implement the communication protocol between KBS and KBC.
    Include application protocol, transport type, API scheme, input and output parameters, etc.

  • Function 3: implement the corresponding attester logic for all potentially supported HW-TEE types
    AA, as the role defined by RATS architecture, is responsible for collecting evidence about the TCB status from the attesting environment and reporting it to the verifier or relying party for verification. The purpose is to convince tenant that the workload is indeed running in a genuine HW-TEE. In order to establish the binding between evidence (called quote in TDX) and user-defined data structure (aka Enclave Held Data, EHD for short), the hash of EHD is embedded into evidence and then the evidence plus EHD is sent to remote peer. Usually, EHD is a public key used for wrapping a secret.

Option 2:AA-KBS E2E

Explicitly provide an implementation of AA and KBS for kata CC.

In this scheme, AA will eventually implement all KBS types mentioned above according to the requirements. The function 2 and 3 belong to the internal details between AA and KBS.

Comparison

  • Compared with option 2, option 1 asks each KBC module of remote attestation to implement all potential HW-TEE attester logic. In fact this is a waste, and the attester logic for a specific HW-TEE just needs to implement once.

  • KBS is platform specific implementation, so option 1 offers the greatest flexibility compared with option 2, and AA doesn't need to care about any implementation details of KBS (for example, AA doesn't need to parse the annotation data in the input parameter of UnWrap API).

  • Cloud HSM KBC is also implement specifically, so the Cloud HSM KBC in option 1 actually needs to implement a modular subsystem to support different Cloud HSM.

  • At present, most of existing implementation of attester is written in non-Rust, so option 2 asks AA to integrate potential unsafe codes. This problem is raised especially for the software running in HW-TEE. Although option 1 has a similar problem, at least KBC module and AA are separated, and the platform providers focusing on security will try their best to use rust to implement the KBC module from a long term.

Reference

Collected Suggestions

  • A Status() method to the KBS API would be useful to handle slow remote attestations and attestation failures. - by @jimcadden

  • One extension to the integrity model for layer encryption would be to verify the decrypted PLBCO (e.g., with a digest provided by the KSM) prior to decrypting the layer. Although, this seems like it can be contained to the implementation of an KBC. - by @jimcadden

  • The AA will send the keyID of the KEK to KBS and KBS releases the wrapped KEK to AA (AA needs to generate a pub/priv so that the KBS can wrap the KEK using AA's pub key for protection). Once the AA gets the KEK, it can locally unwrap the LEK and feed to ocicrypt. The advantage of passing the KEK to AA is that the KEK is usually shared among multiple layers and once AA retrieves the KEK, it can cache it to avoid multiple round trips to KBS to decrypt each LEK. - by @hdxia

  • SEV-SNP does not support pre-attestation in the same way that SEV(-ES) does. One thing I was trying to get across in the previous post is that pre-attestation and SNP attestation both provide the launch measurement to the GOP/KBS in exchange for a key. This should be fairly similar to the TDX approach except that the SEV-SNP measurement has slightly different properties (meaning that we might need additional support to measure containers). - by @fitzthum

@jiazhang0 jiazhang0 reopened this Jul 20, 2021
@fitzthum
Copy link
Member

fitzthum commented Jul 20, 2021

@jimcadden @rudyjantz

Generally, I think we are on the same page here, but let me make a few notes.

First, a bit more detail on the SEV and SEV-ES cases, which we plan to support for v0. @dubek has been working on extending the SEV launch measurement to include the kernel, kernel params, and initrd. By default the SEV launch measurement only includes the firmware. By extending the measurement to the initrd et al, we can verify the kata agent. Patches for this are on the list here and here.

SEV and SEV-ES use pre-launch attestation meaning that during the boot process the launch measurement can be queried and a secret can be securely injected to a guest physical address. @dubek is also working on a kernel module that allows us to read this secret from userspace. This can be found here.

Now, for a slightly higher-level view, the way I like to think about the attestation agent is that it provides userspace key wrapping and unwrapping inside an attested guest by communicating with a trusted third party over a secure channel. For SEV-SNP (I can elaborate on this in a separate issue) the AA sets up the secure channel from inside the guest and drives the attestation. With SEV(-ES) the secure channel is provided by the firmware (and the PSP) and the AA doesn't need to do anything to trigger the attestation. By the time the guest has booted, the attestation will have already been performed and a secret will be available via the securityfs if the measurement was correct. For both SEV(-ES) and SEV-SNP the boot measurement is verified by a trusted third party that conditionally provides keys. We have been referring to this third party as the Guest Owner Proxy (GOP). You call it the KBS.

The difference between SEV(-ES) and SEV-SNP attestation isn't so much that one of them is local and the other remote, but that the secure channel is setup differently and the attestation is triggered differently. In the SEV(-ES) case we will need some extra tooling on the host (an extension of the Kata Runtime) that will trigger the measurement and send it to the GOP/KBS.

Here is a rough diagram we have been using internally. This shows the SEV(-ES) case where the measurement comes from the PSP, through the HV and the Kata Runtime, to the GOP/KBS. After verifying the measurement the GOP/KBS injects a secret that makes its way to the AA and ocicrypt-rs. For SEV-SNP the AA will talk directly with the GOP/KBS.
kata-diagram

@fitzthum
Copy link
Member

As for the two options, we have been assuming that the Attestation Agent would support all applicable platforms, but that the GOP/KBS would be platform specific. There will likely be a GOP for Intel and one for AMD that, while similar, probably won't be exactly the same. It could make sense to take a modular approach to the Attestation Agent where the AA has a generic interface to modules that know how to communicate with corresponding GOP. On the other hand, if there are only two protocols to support, it might be sufficient to implement each one in the AA itself. If someone wants to add another they can patch the AA.

@jimcadden
Copy link
Member

Thanks, @jiazhang0! Your RFC aligns nicely with what we have been discussing here at IBM. I support "Option 1: KBS modularization" as the better choice for the architecture-specific KBS.

A few initial questions/comments about your design:

  1. Do you assume a single PLBCO for every encrypted container in a pod? If no, we need a way for "keyid": "foo" in UnWrapKey() to specify the container being deployed.
  2. Do you measure the container prior to decryption, i.e., for integrity protection?
  3. A Status() method to the KBS API would be useful to handle slow remote attestations and attestation failures
  4. A strategy question, for the Kata V0 prototype, you propose the kata-agent calls into ocicrypt-rs directly. This would require new functionality built into kata-agent (e.g., image pull, unbundle, etc.) and ocicrypt-rs (e.g., keyprovider, grpc, etc.) Alternatively, the V0 prototype could reuse existing tools like skopeo within the guest and make calls into the (golang) ocicrypt

@jiazhang0
Copy link
Member Author

@fitzthum Thanks for the info from your notes. I'm familiar with SEV/SEV-ES attestation, but not aware of SEV-SNP attestation. So I have a initial question about it:

It sounds like not only does SEV-SNP support runtime/dynamical attestation as TDX, it also supports pre-launch attestation (aka pre-attestation) to verify the boot measurements. Are you planning to use the approach of loading kernel, kernel parameters and initrd during pre-launch attestation for SEV-SNP as SEV/SEV-ES in kata CC v0?

@jiazhang0
Copy link
Member Author

@jimcadden

Thanks, @jiazhang0! Your RFC aligns nicely with what we have been discussing here at IBM. I support "Option 1: KBS modularization" as the better choice for the architecture-specific KBS.

A few initial questions/comments about your design:

1. Do you assume a single PLBCO for every encrypted container in a pod? If no, we need a way for `"keyid": "foo"` in   `UnWrapKey()` to specify the container being deployed.

Theoretically speaking, each encrypted layer can contain multiple PLBCOs in the form of annotations corresponding to 1) different KeyIDs provided by different tenants, or 2) multiple KeyIDs provided by a single tenant. So using a single PLBCO or multiple PLBCOs for every encrypted container image in a pod is decided by image creator.

Therefore, an encrypted PLBCO with all necessary info (e.g, KeyID and any other platform specific artifacts) needs to be wrapped to a KBC specific annotation packet. I believe the association info you mentioned should be recorded to KBC specific annotation packet.

However, I'm not sure the necessity of association between the container being deployed and KeyID. Could you clarify its details?

2. Do you measure the container prior to decryption, i.e., for integrity protection?

Not exactly sure what your question is asking for. About integration protection, the newly added annotation ID "org.opencontainers.image.enc.pubopts" contains a field called "hmac" used to verify the integration of encrypted layer blob data. And the conventional field "digest" defined in layer descriptor is used to verify the integration of plain/decrypted layer blob data. Both technologies will be used during image decryption automatically, so it looks like AA doesn't need to do anything about integrity protection if I don't miss something important.

3. A `Status()` method to the KBS API would be useful to handle slow remote attestations and attestation failures

Make sense.

4. A strategy question, for the Kata V0 prototype, you propose the `kata-agent` calls into `ocicrypt-rs` directly. This would require new functionality built into `kata-agent` (e.g., image pull, unbundle, etc.) and `ocicrypt-rs` (e.g., keyprovider, grpc, etc.) Alternatively,  the V0 prototype could reuse existing tools like `skopeo` within the guest and make calls into the (golang) `ocicrypt`

Gerry will provide the details in this thread about the work on the implementation of pullImage() (and etc) in kata-agent.

@jiazhang0
Copy link
Member Author

As for the two options, we have been assuming that the Attestation Agent would support all applicable platforms, but that the GOP/KBS would be platform specific. There will likely be a GOP for Intel and one for AMD that, while similar, probably won't be exactly the same. It could make sense to take a modular approach to the Attestation Agent where the AA has a generic interface to modules that know how to communicate with corresponding GOP. On the other hand, if there are only two protocols to support, it might be sufficient to implement each one in the AA itself. If someone wants to add another they can patch the AA.

Agreed, and I believe there would be more modules with different protocols to communicate with corresponding GOP/KBS.

@sameo
Copy link
Member

sameo commented Jul 21, 2021

cc @dcmiddle @hdxia

@fitzthum
Copy link
Member

@jiazhang0

It sounds like not only does SEV-SNP support runtime/dynamical attestation as TDX, it also supports pre-launch attestation (aka pre-attestation) to verify the boot measurements. Are you planning to use the approach of loading kernel, kernel parameters and initrd during pre-launch attestation for SEV-SNP as SEV/SEV-ES in kata CC v0?

SEV-SNP attestation is completely different from SEV(-ES) except for one key detail; both provide the launch measurement as part of the attestation. This means that our OVMF/QEMU patches that extend the launch measurement will be useful for both SEV(-ES) and SEV-SNP. In both cases the launch measurement is calculated only once based on the initial state of the VM. With SEV(-ES) the launch measurement must be queried by the guest owner during launch, but with SEV-SNP, it can be retrieved anytime from within the guest as part of an attestation report. This attestation report includes a few other things (versioning info for firmware, 512 bits of arbitrary guest-provided data, etc.). The attestation agent will get this attestation report from the PSP and send it to the GOP/KBS. I think this is fairly similar to TDX.

SEV-SNP does not support pre-attestation in the same way that SEV(-ES) does. One thing I was trying to get across in the previous post is that pre-attestation and SNP attestation both provide the launch measurement to the GOP/KBS in exchange for a key. The mechanism for doing so is totally different.

@jiazhang0
Copy link
Member Author

@fitzthum Fairly clear. Thanks!

@jimcadden
Copy link
Member

@jiazhang0

Theoretically speaking, each encrypted layer can contain multiple PLBCOs in the form of annotations corresponding to 1) different KeyIDs provided by different tenants, or 2) multiple KeyIDs provided by a single tenant. So using a single PLBCO or multiple PLBCOs for every encrypted container image in a pod is decided by image creator.

Therefore, an encrypted PLBCO with all necessary info (e.g, KeyID and any other platform specific artifacts) needs to be wrapped to a KBC specific annotation packet. I believe the association info you mentioned should be recorded to KBC specific annotation packet.

However, I'm not sure the necessity of association between the container being deployed and KeyID. Could you clarify its details?

Yes, I agree. Thank you for clarifying.

About integration protection, the newly added annotation ID "org.opencontainers.image.enc.pubopts" contains a field called "hmac" used to verify the integration of encrypted layer blob data. And the conventional field "digest" defined in layer descriptor is used to verify the integration of plain/decrypted layer blob data. Both technologies will be used during image decryption automatically, so it looks like AA doesn't need to do anything about integrity protection if I don't miss something important.

The scheme you describe provides a mostly complete solution for integrity. One extension to this model would be to verify the decrypted PLBCO (e.g., with a digest provided by the KSM) prior to decrypting the layer. Although, this seems like it can be contained to the implementation of an KBC.

Gerry will provide the details in this thread about the work on the implementation of pullImage() (and etc) in kata-agent.

Great! We should discuss how we can collaborate on these components, and what else needs attention for V0.

@jimcadden
Copy link
Member

One question that came up is how the AA/KBS design could prevent unpermitted containers from being deployed? For example, a situation where an untrusted cloud provider attempts to deploy malicious containers alongside the trusted ones. Since the kata-agent has no knowledge of what should-or-should-not be deployed, it must fall to the KBS and/or the external key management to determine which containers are allowed to run. One way this could be enforced is by requiring all containers to be encrypted. Your thoughts?

cc. @jiazhang0 @fitzthum @sameo

@hdxia
Copy link
Member

hdxia commented Jul 21, 2021

This design of AA and KBS is a bit different from what I thought/proposed earlier. In this design, each layer LEK has to be sent to the KBS for it to decrypt. The advantage is the KEK will not leave KMS/KBS. However, it will cause multiple trips to the KBS/KMS for LEK decryption if the container has many layers with different LEK. In most cases, the KEK is shared among layers to protect the LEK and that is the purpose of having LEK for each layer. In the architecture I proposed earlier, the AA will send the keyID of the KEK to KBS and KBS releases the wrapped KEK to AA (AA needs to generate a pub/priv so that the KBS can wrap the KEK using AA's pub key for protection). Once the AA gets the KEK, it can locally unwrap the LEK and feed to ocicrypt. The AA is running inside the TEE so the KEK is protected after being transferred to AA. the KEK transmission from KBS to AA is protected with the AA pub/priv key pair. So the KEK can be safely protected on the way.

The advantage of passing the KEK to AA is that the KEK is usually shared among multiple layers and once AA retrieves the KEK, it can cache it to avoid multiple round trips to KBS to decrypt each LEK.

@hdxia
Copy link
Member

hdxia commented Jul 21, 2021

One question that came up is how the AA/KBS design could prevent unpermitted containers from being deployed? For example, a situation where an untrusted cloud provider attempts to deploy malicious containers alongside the trusted ones. Since the kata-agent has no knowledge of what should-or-should-not be deployed, it must fall to the KBS and/or the external key management to determine which containers are allowed to run. One way this could be enforced is by requiring all containers to be encrypted. Your thoughts?

cc. @jiazhang0 @fitzthum @sameo

The ideal way is for the kata agent to measure what containers are to be loaded. When kata agent calls into AA to retrieve KEK to decrypt the encrypted container, the KBS receives the request and sends attestation challenge. so the TEE attestation service can verify the entire stack before releasing the KEK. This requires the measurement to be extended all the way to kata agent and the containers. Intel TDX supports it. AMD-SEV/ES may not, but AMD-SNP may.

@fitzthum
Copy link
Member

This design of AA and KBS is a bit different from what I thought/proposed earlier. In this design, each layer LEK has to be sent to the KBS for it to decrypt.

Thanks for pointing this out @hdxia. For SEV(-ES) there is no persistent secure connection between the GOP/KBS and the AA. Our plan is to use SEV secret injection to provision a set of keys at boot (given a valid launch measurement). The AA can then use the keys to unwrap the LEKs. This may be what @jiazhang0 is referring to as local attestation, but I am a little unclear on the terminology still.

It would be possible to use secret injection to provision a key that the AA could use to setup a persistent secure channel to the GOP/KBS. Then the AA could request individual keys or relay unwrap commands. Injecting the secrets directly at boot seems easier.

I think the plan is to do something similar for SEV-SNP. Rather than relaying every unwrap request to the GOP/KBS, the GOP/KBS will probably send over a set of keys once the AA has provided a valid attestation report.

I think that usually one of the main goals of remote attestation is to keep the key from traveling, but I don't know if that is our priority here. As you point out, the enclave is trusted.

@fitzthum
Copy link
Member

The ideal way is for the kata agent to measure what containers are to be loaded. When kata agent calls into AA to retrieve KEK to decrypt the encrypted container, the KBS receives the request and sends attestation challenge. so the TEE attestation service can verify the entire stack before releasing the KEK. This requires the measurement to be extended all the way to kata agent and the containers. Intel TDX supports it. AMD-SEV/ES may not, but AMD-SNP may.

For all versions of SEV, the launch measurement will include the firmware, initrd, kernel, and kernel params (once our patches get upstream). The launch measurement is calculated at boot, however, and reflects only the initial state of these elements. Containers that are pulled in by the Kata Agent after the guest has booted will not be reflected in the measurement. That said, since the Kata Agent is measured at boot, we can trust it to measure any containers it pulls in and send the measurement to the GOP/KBS.

For SEV(-ES), we don't have a persistent connection to the GOP/KBS, but we can potentially deliver a list of allowed measurements at boot (along with the KEKs).

It might be a good idea to explicitly measure every container that is pulled into the enclave, but we do not get that for free with any version of SEV. We would need to add extra functionality to the Kata Agent or whatever service pulls the image in the guest and this would need to be coordinated with the AA.

Could we potentially use a signature-based approach instead?

@hdxia
Copy link
Member

hdxia commented Jul 21, 2021

This design of AA and KBS is a bit different from what I thought/proposed earlier. In this design, each layer LEK has to be sent to the KBS for it to decrypt.

Thanks for pointing this out @hdxia. For SEV(-ES) there is no persistent secure connection between the GOP/KBS and the AA. Our plan is to use SEV secret injection to provision a set of keys at boot (given a valid launch measurement). The AA can then use the keys to unwrap the LEKs. This may be what @jiazhang0 is referring to as local attestation, but I am a little unclear on the terminology still.

It would be possible to use secret injection to provision a key that the AA could use to setup a persistent secure channel to the GOP/KBS. Then the AA could request individual keys or relay unwrap commands. Injecting the secrets directly at boot seems easier.

I think the plan is to do something similar for SEV-SNP. Rather than relaying every unwrap request to the GOP/KBS, the GOP/KBS will probably send over a set of keys once the AA has provided a valid attestation report.

I think that usually one of the main goals of remote attestation is to keep the key from traveling, but I don't know if that is our priority here. As you point out, the enclave is trusted.

Thanks @fitzthum. performance is also a big concern when container is launched in addition to security. with the same level of security, we should have performance as priority. Or at least from design perspective, we should make it flexible in case some are paranoid with security.

@hdxia
Copy link
Member

hdxia commented Jul 21, 2021

The ideal way is for the kata agent to measure what containers are to be loaded. When kata agent calls into AA to retrieve KEK to decrypt the encrypted container, the KBS receives the request and sends attestation challenge. so the TEE attestation service can verify the entire stack before releasing the KEK. This requires the measurement to be extended all the way to kata agent and the containers. Intel TDX supports it. AMD-SEV/ES may not, but AMD-SNP may.

For all versions of SEV, the launch measurement will include the firmware, initrd, kernel, and kernel params (once our patches get upstream). The launch measurement is calculated at boot, however, and reflects only the initial state of these elements. Containers that are pulled in by the Kata Agent after the guest has booted will not be reflected in the measurement. That said, since the Kata Agent is measured at boot, we can trust it to measure any containers it pulls in and send the measurement to the GOP/KBS.

For SEV(-ES), we don't have a persistent connection to the GOP/KBS, but we can potentially deliver a list of allowed measurements at boot (along with the KEKs).

It might be a good idea to explicitly measure every container that is pulled into the enclave, but we do not get that for free with any version of SEV. We would need to add extra functionality to the Kata Agent or whatever service pulls the image in the guest and this would need to be coordinated with the AA.

Could we potentially use a signature-based approach instead?

signature may be possible, but the CA and the entity to verify the signature has to be trusted. in this case, both the kata agent and the root CA to verify the signature has to be measured and attested by the TEE attestation service.

@jiazhang0
Copy link
Member Author

One question that came up is how the AA/KBS design could prevent unpermitted containers from being deployed? For example, a situation where an untrusted cloud provider attempts to deploy malicious containers alongside the trusted ones. Since the kata-agent has no knowledge of what should-or-should-not be deployed, it must fall to the KBS and/or the external key management to determine which containers are allowed to run. One way this could be enforced is by requiring all containers to be encrypted. Your thoughts?

cc. @jiazhang0 @fitzthum @sameo

Actually you are talking about unauthorized use of image. I think image encryption plus remote attestation can partly mitigate this issue, because it can prevent others from running an images at will and enforce a remote attestation procedure to cause an auditable behavior observed by KBS. With a strong policy driven mechanism, KBS can partly limit the unauthorized use.

@jiazhang0
Copy link
Member Author

Thanks for pointing this out @hdxia. For SEV(-ES) there is no persistent secure connection between the GOP/KBS and the AA. Our plan is to use SEV secret injection to provision a set of keys at boot (given a valid launch measurement). The AA can then use the keys to unwrap the LEKs. This may be what @jiazhang0 is referring to as local attestation, but I am a little unclear on the terminology still.

The term "local attestation" is mentioned in P19 https://docs.google.com/presentation/d/1469nSRFtlHMSDDDWVLj0i21dR9M_3SO76ehQsyVSTUk/edit#slide=id.gdccf80c723_0_1261 , so I use this term.

I understand why you are confused :) Because obviously the AA for SEV(-ES) just retrieves the injected secret as result of pre-attestation rather than initializing a so-called local attestation procedure at that moment. In addition, I firstly heard about the term "local attestation" when I investigate Intel SGX attestation, which ensures two SGX enclave instances on a local node are attested each other. What is your preference about its naming?

@hdxia
Copy link
Member

hdxia commented Jul 22, 2021

This design of AA and KBS is a bit different from what I thought/proposed earlier. In this design, each layer LEK has to be sent to the KBS for it to decrypt.

Thanks for pointing this out @hdxia. For SEV(-ES) there is no persistent secure connection between the GOP/KBS and the AA. Our plan is to use SEV secret injection to provision a set of keys at boot (given a valid launch measurement). The AA can then use the keys to unwrap the LEKs. This may be what @jiazhang0 is referring to as local attestation, but I am a little unclear on the terminology still.
It would be possible to use secret injection to provision a key that the AA could use to setup a persistent secure channel to the GOP/KBS. Then the AA could request individual keys or relay unwrap commands. Injecting the secrets directly at boot seems easier.
I think the plan is to do something similar for SEV-SNP. Rather than relaying every unwrap request to the GOP/KBS, the GOP/KBS will probably send over a set of keys once the AA has provided a valid attestation report.
I think that usually one of the main goals of remote attestation is to keep the key from traveling, but I don't know if that is our priority here. As you point out, the enclave is trusted.

Thanks @fitzthum. performance is also a big concern when container is launched in addition to security. with the same level of security, we should have performance as priority. Or at least from design perspective, we should make it flexible in case some are paranoid with security.

Also. you are sending the LEK from KBS to AA, there is no difference between sending KEK vs LEK from KBS to AA. if you are worried about the security of KEK, you also need to worry about LEK since if someone gets hold of LEK, he/she can decrypt your layers. that is why in my original diagram, we plan to send KEK, rather LEK for performance reason.

@jiazhang0
Copy link
Member Author

This design of AA and KBS is a bit different from what I thought/proposed earlier. In this design, each layer LEK has to be sent to the KBS for it to decrypt. The advantage is the KEK will not leave KMS/KBS. However, it will cause multiple trips to the KBS/KMS for LEK decryption if the container has many layers with different LEK. In most cases, the KEK is shared among layers to protect the LEK and that is the purpose of having LEK for each layer. In the architecture I proposed earlier, the AA will send the keyID of the KEK to KBS and KBS releases the wrapped KEK to AA (AA needs to generate a pub/priv so that the KBS can wrap the KEK using AA's pub key for protection). Once the AA gets the KEK, it can locally unwrap the LEK and feed to ocicrypt. The AA is running inside the TEE so the KEK is protected after being transferred to AA. the KEK transmission from KBS to AA is protected with the AA pub/priv key pair. So the KEK can be safely protected on the way.

The advantage of passing the KEK to AA is that the KEK is usually shared among multiple layers and once AA retrieves the KEK, it can cache it to avoid multiple round trips to KBS to decrypt each LEK.

@hdxia Thanks for your comments. It is a reasonable implementation option, and what you propose can be implemented as ISECL KBS module for AA. Does it make sense to you?

@jiazhang0
Copy link
Member Author

It would be possible to use secret injection to provision a key that the AA could use to setup a persistent secure channel to the GOP/KBS. Then the AA could request individual keys or relay unwrap commands. Injecting the secrets directly at boot seems easier.

Secret injection follows the verification for a valid launch measurement during guest launch stage, so the guest has been trusted before performing the first instruction. Is it necessary to inject dedicated key used to setup a secured channel for AA and GOP/KBS? Is it possible to generate it at will for wrapping/unwrapping payload when AA/KBS needs a communication?

@jiazhang0
Copy link
Member Author

This design of AA and KBS is a bit different from what I thought/proposed earlier. In this design, each layer LEK has to be sent to the KBS for it to decrypt.

Thanks for pointing this out @hdxia. For SEV(-ES) there is no persistent secure connection between the GOP/KBS and the AA. Our plan is to use SEV secret injection to provision a set of keys at boot (given a valid launch measurement). The AA can then use the keys to unwrap the LEKs. This may be what @jiazhang0 is referring to as local attestation, but I am a little unclear on the terminology still.
It would be possible to use secret injection to provision a key that the AA could use to setup a persistent secure channel to the GOP/KBS. Then the AA could request individual keys or relay unwrap commands. Injecting the secrets directly at boot seems easier.
I think the plan is to do something similar for SEV-SNP. Rather than relaying every unwrap request to the GOP/KBS, the GOP/KBS will probably send over a set of keys once the AA has provided a valid attestation report.
I think that usually one of the main goals of remote attestation is to keep the key from traveling, but I don't know if that is our priority here. As you point out, the enclave is trusted.

Thanks @fitzthum. performance is also a big concern when container is launched in addition to security. with the same level of security, we should have performance as priority. Or at least from design perspective, we should make it flexible in case some are paranoid with security.

Also. you are sending the LEK from KBS to AA, there is no difference between sending KEK vs LEK from KBS to AA. if you are worried about the security of KEK, you also need to worry about LEK since if someone gets hold of LEK, he/she can decrypt your layers. that is why in my original diagram, we plan to send KEK, rather LEK for performance reason.

@hdxia

I think @fitzthum is not talking about LEK traveling. His points include:

  • The initial proposal from me for SEV(-ES) is to inject KEK (in your term) through pre-attestation.
  • According to your suggestion, a wrapping key used to secure communication channel between AA and GOP/KBS can be injected through pre-attestation, and later AA can use this wrapping key to request KEK from KBS to decrypt LEK.
  • SEV-SNP can request KEK from KBS during runtime/dynamical attestation procedure instead of pre-attestation procedure.

@fitzthum
Copy link
Member

* The initial proposal from me for SEV(-ES) is to inject KEK (in your term) through pre-attestation.

Yup, this seems like the easiest way to do SEV(-ES) support.

* According to your suggestion, a wrapping key used to secure communication channel between AA and GOP/KBS can be injected through pre-attestation, and later AA can use this wrapping key to request KEK from KBS to decrypt LEK.

We could do this with SEV(-ES), but it's more complex than the above and I'm not sure it has significant benefits.

* SEV-SNP can request KEK from KBS during runtime/dynamical attestation procedure instead of pre-attestation procedure.

Yup, this should be fairly similar to the TDX approach except that the SEV-SNP measurement has slightly different properties (meaning that we might need additional support to measure containers).

Are we all on the same page here? It seems like we have two categories: pre-attestation, and runtime attestation. At least for v0 seems like we are leaning towards just sending the KEK to the AA once the attestation passes.

@jiazhang0
Copy link
Member Author

jiazhang0 commented Jul 23, 2021

Are we all on the same page here? It seems like we have two categories: pre-attestation, and runtime attestation. At least for v0 seems like we are leaning towards just sending the KEK to the AA once the attestation passes.

I have no objection two categories. And I think the "GetKey" style (sending the KEK to the AA once and caching) and "UnwrapKey" style (KEK not leave KMS/KBS and relaying unwrapKey request to KBS) would be platform specific implementations.

@jialez0
Copy link
Member

jialez0 commented Jul 25, 2021

@fitzthum @jiazhang0 I will submit the initial code implementation with a sample KBC module using harcoded KEK.

@jiazhang0
Copy link
Member Author

@larrydewey
Copy link

First, thank you for your patience, as I am still coming up to speed with some of the concepts being relayed here.

In reference to @fitzthum's graphic, would it be possible / make sense to remove the attestation-agent from each VM and construct it as it's own stand-alone image? I recognize one of the downsides of this approach would include the need for the attestation-agent image being spun up first, but it could also dramatically decrease the number of direct connections to the CPU. It would also impact scalability of the attestation agent to now being one-to-many.

@fitzthum
Copy link
Member

fitzthum commented Aug 2, 2021

@larrydewey

I suspect it's best to have the attestation agent in the same enclave as the workloads. For one thing the interface between ocicrypt-rs and the attestation agent isn't really designed to cross the trust boundary. To decrypt a confidential workload ocicrypt-rs will use rpc to ask the attestation agent to unwrap a key for each layer of the image. The attestation agent gives back an unwrapped key which allows access to a layer. If the attestation agent were in a different vm from ocicrypt-rs, we would have to rework this communication to make sure the unwrapped key is not exposed. We could do something like that, but I'm not sure what the benefit is.

In SEV(-ES) there is no direct connection between the attestation agent and the CPU. The AA uses keys that are injected via the launch secret mechanism. QEMU facilitates this via QMP on the host. A kernel module in the guest pulls the injected secret from a GPA and puts it into the filesystem. The AA just needs to read a file to access the secret. The host will have to connect to the PSP to provide the encrypted launch secret (on behalf of the guest owner), but QEMU will have to connect to the PSP anyway as part of the boot sequence.

For SEV-SNP there is a direct connection between the AA and the PSP, but I think this may be required. For SNP the AA is responsible for setting up a secure connection to the KBS/GOP. To do so the AA requests an attestation report from the PSP. AFAIK attestation reports must be requested from inside the VM that they pertain to. Something else would have to take care of this if we moved the AA. I could be missing something here with minimizing direct connections. Is there a limit to the number of connections to the PSP? Is there a significant cost?

Finally, it might be worth noting that the attestation agent would probably have to be tweaked to support a one-to-many relationship with nodes, particularly if one AA were servicing multiple tenants. In the current plan there is a one-to-many relationship between the GOP/KBS and the AA and a one-to-one relationship between the AA and the VM. Having the AA inside the VM seems like the simplest approach.

@larrydewey
Copy link

larrydewey commented Aug 3, 2021

@fitzthum

... If the attestation agent were in a different vm from ocicrypt-rs, we would have to rework this communication to make sure the unwrapped key is not exposed. We could do something like that, but I'm not sure what the benefit is.

Correct, adjusting for my suggested changes would mean a substantial deviation from the original design, so I am very appreciative for the opportunity for a discussion! I can see the adjustment significantly impacting both the scalability and performance. Instead of creating an instance of the AA in each kata container, you could have far fewer instances for large-scale deployments, but still accomplish the same task.

In SEV(-ES) there is no direct connection between the attestation agent and the CPU ...

You're right. My initial thought was based on SEV-SNP, so SEV(-ES) is not directly related this discussion.

... AFAIK attestation reports must be requested from inside the VM that they pertain to. Something else would have to take care of this if we moved the AA. I could be missing something here with minimizing direct connections. ...

Correct. The attestation report must be requested from within the guest. As we are planning to use (g|tt)RPC, we should still be able to accomplish this. We would need to secure the RPC connection between the AA and the guest, and from my understanding the protocol should have this security built into it.

... Is there a limit to the number of connections to the PSP? Is there a significant cost?

My usage of "connections" was probably the wrong word to describe my concern. The real issue would be surrounding the requests (for lack of a better word) being made to the PSP. The first question is, is the PSP in ring buffer mode, or is it in mailbox mode?

  • If the PSP is in ring buffer mode then there is a maximum limit of 256 (size of the buffer) requests. They are all queued and executed one at a time, but won't generate an interrupt until every action is finished.
  • If the PSP is in mailbox mode then one request is processed at a time. The original request is passed to the mailbox buffer, read in, and performed. All additional requests, although possibly writable to the buffer, would essentially be rejected. Any residual content residing in the buffer would be overwritten with the output of the originally processed command.

If the AA is separated, it would provide a somewhat centralized queue for the report requests to be made.

@larrydewey
Copy link

After some additional research, the concerns mentioned above have been resolved. The Linux driver written for interacting with the PSP handles the necessary queuing of the requests. The Windows driver utilizes the ring buffer mode.

@ariel-adam
Copy link
Member

@jiazhang0 is this issue still relevant or can be closed?
If it's still relevant to what release do you think we should map it to (mid-November, end-December, mid-February etc...)?

@jialez0 jialez0 closed this as completed Oct 12, 2022
@dcmiddle dcmiddle transferred this issue from confidential-containers/attestation-agent Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants