Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

End to end encryption of traffic with ACM managed certs #39

Open
jamsajones opened this issue Nov 28, 2018 · 33 comments

Comments

@jamsajones
Copy link

@jamsajones jamsajones commented Nov 28, 2018

No description provided.

@mumoshu

This comment has been minimized.

Copy link

@mumoshu mumoshu commented Dec 5, 2018

This is very interesting indeed!

I was planning to use Istio to enable end-to-end encryption between microservices, so that we have no chances to connect wrong services due to reused VPC IPs. But maintaining Istio CA/Auth/Citadel and the other parts of Istio's control-plane and its foundation, K8S cluster, just for a service mesh seemed an overkill.

I'd expect App Mesh integrated with ACM to provide the same benefit, without the operational burden.

@pda

This comment has been minimized.

Copy link

@pda pda commented Dec 11, 2018

This is the biggest blocker for us moving from traditional serviceA → ALB → serviceB approach to serviceA → AppMesh → serviceB — we always want inter-service requests to use HTTPS with ACM-issued certificates.

But, I can't see how this will be possible in the current model; the App Mesh architecture has the Envoy proxy running under our (the AWS user) control as a sidecar container rather than under AWS' control (as is the case with ALB etc). ACM must not hand the certificate/privkey over to Envoy. So App Mesh would need to introduce another layer of proxy, or perhaps integrate ALB to terminate the HTTPS/TLS.

I'm very keen to hear more about how this might work.

@bcelenza bcelenza changed the title End to snd encryption of traffic with ACM managed certs End to end encryption of traffic with ACM managed certs Dec 31, 2018
@kiranmeduri

This comment has been minimized.

Copy link

@kiranmeduri kiranmeduri commented Mar 28, 2019

This is the biggest blocker for us moving from traditional serviceA → ALB → serviceB approach to serviceA → AppMesh → serviceB — we always want inter-service requests to use HTTPS with ACM-issued certificates.

But, I can't see how this will be possible in the current model; the App Mesh architecture has the Envoy proxy running under our (the AWS user) control as a sidecar container rather than under AWS' control (as is the case with ALB etc). ACM must not hand the certificate/privkey over to Envoy. So App Mesh would need to introduce another layer of proxy, or perhaps integrate ALB to terminate the HTTPS/TLS.

I'm very keen to hear more about how this might work.

@pda I would like to understand more about the concern around "ACM must not hand the certificate/privkey over to Envoy". I am assuming this is in the context of TLS termination on the Envoy for incoming traffic on the service endpoint. If the cert pair is specific to the service then what is the concern in giving the secrets to Envoy that is going to terminate TLS?

@coultn coultn transferred this issue from aws/aws-app-mesh-examples Mar 28, 2019
@jamsajones jamsajones added this to Researching in aws-app-mesh-roadmap Mar 28, 2019
@pda

This comment has been minimized.

Copy link

@pda pda commented Mar 28, 2019

@kiranmeduri Thanks for the reply.

When I say “ACM must not …” I'm referring to my understanding that ACM by design always keeps certificates/secrets in AWS-managed context where the AWS customer cannot access it. e.g. there's no way a customer can retrieve a certificate private key attached to an ALB.

Whereas App Mesh has envoy running in customer-managed space. If App Mesh / ACM passes the certificate & private key to the envoy proxy, wouldn't it be possible for the AWS customer to access/exfiltrate it?

It's quite likely I'm misunderstanding some aspect of this.

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Apr 11, 2019

@pda You are correct for ACM certificates that are publicly verifiable: the private key cannot be retrieved. For private certificates (from ACM PCA), the private key can be retrieved on behalf of the customer through a secure channel. Private certificates would be useful for service-to-service communication (and mTLS) within a VPC or other private network.

I'm currently researching a number of scenarios we'd like to support, both public and private, and will follow-up here once I have a good handle on what we're proposing.

@pda

This comment has been minimized.

Copy link

@pda pda commented Apr 12, 2019

Thanks for the clarification @bcelenza — I look forward to hearing more on this front 👍

@bcelenza bcelenza self-assigned this Apr 12, 2019
@tom-schultz

This comment has been minimized.

Copy link

@tom-schultz tom-schultz commented Apr 24, 2019

Any updates here?

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Apr 26, 2019

@tom-schultz I'm currently talking with the ACM team and working on a design proposal for this feature. I'll have an update here soon.

In the meantime, I'm looking for more input from anyone willing to take the time.

Here are some questions I have. Feel free to answer any that pertain to you. And of course, if I've missed something you feel needs mention, I'd be happy to hear that as well.

A big thanks in advance for anyone who takes the time to provide additional input here.

Questions

  1. Would your service mesh use a single private certificate authority, or multiple? If multiple, what use cases govern that need?
  2. How frequently would you want to renew certificates for your service mesh?
  3. Would you want App Mesh to automatically issue and/or renew certificates for your service mesh?

I'm also curious, for any customers who use ACM PCA today, do you always use ACM-validated domains, or do you use the issue-certificate and import-certificate APIs for certain things?

@alvarow

This comment has been minimized.

Copy link

@alvarow alvarow commented May 6, 2019

@bcelenza

  1. I don't have a preference in that regard. My preference would be to not manage a CA! I'd happily use either Amazon Public CA, or a private CA on my account.

  2. I am not particularly picky in this one, but the security team in my company prefers 1 year certs, no longer than that. Certificate Manager fits nicely.

  3. Most definitely!

For the curiosity question, I mostly use ACM validated domains, but I have one scenario where I need an internal domain hosted on our intranet, which I use the internal PKI to issue it (DC applications making calls to my VPC hosted app).

This feature right here is what prevents me from using App Mesh. I need E2E encryption and I am making do with NLB and Vault issued certs, I would love to drop Vault.

@tom-schultz I'm currently talking with the ACM team and working on a design proposal for this feature. I'll have an update here soon.

In the meantime, I'm looking for more input from anyone willing to take the time.

Here are some questions I have. Feel free to answer any that pertain to you. And of course, if I've missed something you feel needs mention, I'd be happy to hear that as well.

A big thanks in advance for anyone who takes the time to provide additional input here.

Questions

  1. Would your service mesh use a single private certificate authority, or multiple? If multiple, what use cases govern that need?
  2. How frequently would you want to renew certificates for your service mesh?
  3. Would you want App Mesh to automatically issue and/or renew certificates for your service mesh?

I'm also curious, for any customers who use ACM PCA today, do you always use ACM-validated domains, or do you use the issue-certificate and import-certificate APIs for certain things?

@shubharao shubharao moved this from Researching to We're Working On It in aws-app-mesh-roadmap May 7, 2019
@arnuschky

This comment has been minimized.

Copy link

@arnuschky arnuschky commented May 21, 2019

Questions

  1. Would your service mesh use a single private certificate authority, or multiple? If multiple, what use cases govern that need?

A single per mesh I guess. We have multiple products, each using their own mesh and CA. But these are separate by AWS accounts so fine for us to have a 1:1 map between PCA and mesh.

  1. How frequently would you want to renew certificates for your service mesh?

No special requirements, at least following common security guidelines (1-2 years). However, we prefer automatic issuance at which point you can rotate much more regularly.

  1. Would you want App Mesh to automatically issue and/or renew certificates for your service mesh?

Ideally yes. We'd prefer it to interface with ACM; similar to CloudFront distributions (create new / reuse existing).

Apart from that we'd love to have authentication too, if possible.

@ntwaddell

This comment has been minimized.

Copy link

@ntwaddell ntwaddell commented Jun 5, 2019

It would be cool if this could integrate with the AWS Private CA possibly.

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Jun 10, 2019

App Mesh will soon be adding support for enabling TLS between services in the mesh. This first pass will allow you to provide a certificate directly from AWS Certificate Manager (ACM) and enable TLS for a given VirtualNode listener. VirtualNodes that act as downstream clients of a TLS-enabled VirtualNode will automatically receive the appropriate validation context to validate the certificate you provide.

With this change, you will be able to use the following options to secure traffic between services:

  1. A certificate issued by your Private Certificate Authority for which ACM manages the private key and certificate renewal (see Request a Private Certificate).
  2. A certificate that has been imported to ACM.

Please note that at this time you cannot use a public certificate provided by ACM.

To enable TLS with a private or imported certificate, we're proposing the following API settings on the VirtualNode listener.

$ aws appmesh create-virtual-node --mesh-name my-mesh \
    --virtual-node-name my-node \
    --spec
{
    "listeners": [
        {
            // Existing port mapping settings.
            "portMapping": {
                "port": 443,
                "protocol": "http"
            },
            // Optional settings for TLS configuration on this listener. When not
            // specified, TLS is disabled.
            "tls": {
                // (REQUIRED) Determines how TLS will be configured on the appropriate listener.
                // Allowed modes:
                // * STRICT: Listener only accepts connections with TLS enabled.
                // * PERMISSIVE: Listener accepts connections with or without TLS enabled.
                // * DISABLED: Listener only accepts connections without TLS.
                "mode": "STRICT",
                // (REQUIRED) Certificate settings for this listener.
                "certificate": {
                    "acm": {
                        // (REQUIRED) The ARN of the certificate to bind to this listener.
                        "certificateArn": "arn:aws:acm:region:123456789012:certificate/12345678-1234-1234-1234-123456789012"
                    }
                }
            }
    ]
}

These changes will enable TLS between services with the use of a server certificate. Please note that client certificates for mTLS are covered in separate roadmap items (#34, #68).

Let us know if these changes fit your service traffic encryption use cases, and if not, what else you'd like to see.

@ntwaddell

This comment has been minimized.

Copy link

@ntwaddell ntwaddell commented Jun 10, 2019

That would be perfect @bcelenza

@bcelenza bcelenza moved this from We're Working On It to Coming Soon in aws-app-mesh-roadmap Jun 20, 2019
@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Jul 3, 2019

Heads up! For customers who will be using the ACM integration with App Mesh, you will need to update the IAM policy associated with the Envoy Proxy connecting to App Mesh's Envoy Management Service. See #80 for details.

bcelenza added a commit to bcelenza/aws-app-mesh-roadmap that referenced this issue Aug 6, 2019
bcelenza added a commit to bcelenza/aws-app-mesh-examples that referenced this issue Aug 6, 2019
bcelenza added a commit to bcelenza/aws-app-mesh-examples that referenced this issue Aug 6, 2019
bcelenza added a commit to bcelenza/aws-app-mesh-examples that referenced this issue Aug 6, 2019
bcelenza added a commit to bcelenza/aws-app-mesh-examples that referenced this issue Aug 6, 2019
efe-selcuk added a commit to aws/aws-app-mesh-examples that referenced this issue Aug 6, 2019
@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Aug 6, 2019

Hey all, this is ready for trial in our preview environment. Check out the walkthrough to get started using TLS w/ ACM in App Mesh, and the docs for more info. Let us know what you think!

@shubharao shubharao moved this from Coming Soon to Available in Preview Channel in aws-app-mesh-roadmap Aug 7, 2019
@joshuabaird

This comment has been minimized.

Copy link

@joshuabaird joshuabaird commented Aug 7, 2019

@bcelenza I noticed that it's recommended that new virtual nodes are created with TLS enabled. Can the TLS configuration not be applied to existing virtual nodes? If not, can you shed some light on why new virtual nodes are required? Hopeful that this will be possible once this feature goes GA.

The implementation of App Mesh has been rather time consuming for us, due to the AWS requirements that certain pieces of infrastructure need to be completely re-created (and not updated) such as enabling service discovery on existing ECS services, changing the type of target-group to "ip" (needed for awsvpc enablement), etc.

@shubharao

This comment has been minimized.

Copy link

@shubharao shubharao commented Aug 7, 2019

We have identified some potential issues with this feature and therefore disabling it until we investigate further. We will keep you updated about the progress. Apologies!

bcelenza added a commit that referenced this issue Aug 7, 2019
* Add TLS structures to preview model

#38
#39

* Revert "Add TLS structures to preview model (#90)"

This reverts commit 442348c.
@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Aug 7, 2019

@joshuabaird The only reason we recommend creating a new virtual node is so you can carefully traffic shift from plaintext to TLS by way of a route. You can definitely apply certificates to existing nodes, but there is an eventual consistency concern between when the Envoy terminating TLS recieves the update, and the clients originating TLS receive validation context, that could cause traffic to fail for a short duration (seconds).

@oceaneLonneux

This comment has been minimized.

Copy link

@oceaneLonneux oceaneLonneux commented Aug 9, 2019

Please note that at this time you cannot use a public certificate provided by ACM.

Just to check, does this mean if you have a private certificate, it is not recommended to use App Mesh? Are you guys gonna work on it in the future or it's not on the roadmap in the near future?

Thanks!

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Aug 12, 2019

@oceaneLonneux Specifically it means if you have a public certificate it is not recommended to use with App Mesh and Envoy. Instead we would recommend you terminate TLS with a Network, Application, or Classic load balancer using the public certificate, then re-encrypt the traffic to the Envoy Proxy using a private certificate.

Public TLS termination is in scope for the future w/ App Mesh and Envoy.

Hope that helps!

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Aug 13, 2019

Hey all,

Shortly after launch into the preview channel, we found a bug that we did not feel comfortable exposing to customers, and temporarily disabled access to the feature. Since posting the initial launch announcement and subsequent bug finding, we’ve been working to find a way to resolve the discovered issue.

In the coming weeks we’ll have a fix and will re-release the feature in the App Mesh preview channel. However, upon re-release App Mesh will only support certificates from ACM which have been issued by an ACM Private Certificate Authority. We’ll follow up when we have a new plan for using imported certificates from ACM with App Mesh.

We’d love to receive more feedback on how you rank the following certificate types from ACM with your own service mesh use cases:

  1. Public certificates issued by ACM
  2. Private certificates issued by ACM from a Private Certificate Authority
  3. Imported certificates stored by ACM
@alvarow

This comment has been minimized.

Copy link

@alvarow alvarow commented Aug 13, 2019

+1 for 3. Imported certificates stored by ACM

@pottava

This comment has been minimized.

Copy link

@pottava pottava commented Aug 14, 2019

+1 for 3. Option 2 will be also great if we can use the Private Certificate Authority without any additional cost or with much lower costs. Thanks for your great work and I hope you could find a better way soon.

@ntwaddell

This comment has been minimized.

Copy link

@ntwaddell ntwaddell commented Aug 14, 2019

ACM issued by ACM Private Certificate Authority is good for me

@bcelenza bcelenza moved this from Coming Soon to Available in Preview Channel in aws-app-mesh-roadmap Sep 6, 2019
@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Sep 6, 2019

This feature has been re-enabled in our preview environment and is ready for testing. See the updated walkthrough for the latest information on how it works.

We're still working on enabling imported certificates with ACM, and will follow up here when we have more news.

Looking forward to your feedback!

@joshuabaird

This comment has been minimized.

Copy link

@joshuabaird joshuabaird commented Sep 9, 2019

@bcelenza How does Envoy know to trust the private CA generated by PCA? Is Envoy simply configured not to validate the CA certificate?

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Sep 9, 2019

@joshuabaird App Mesh will automatically distribute the appropriate validation chain. Keep in mind that this is simply to support encrypting the traffic, and does not intend to serve as authenticating the certificate in any way.

We will be adding explicit TLS validation context for Virtual Node backends in the near future, which will allow you to set which CA you want to use to validate a given backend's certificates. Hopefully that makes sense.

@sviik

This comment has been minimized.

Copy link

@sviik sviik commented Sep 23, 2019

Context: using EKS.

I have somewhat strange problem with this feature. Namely, TLS works fine only when just one VirtualNode has configured to use TLS. When adding TLS to another VirtualNode, requests start failing.

Let's say I have serviceA and serviceB. VirtualNodes of both services are configured to use TLS (PERMISSIVE or STRICT mode). However, when I curl serviceA (both http or https), the requests fail (Connection refused). Requests to serviceB succeed.

Now, when I

  1. Remove TLS from serviceB VirtualNode
  2. Restart serviceA pod(s)
  3. Add TLS back to serviceB VirtualNode

then requests (both http and https) to both services work. But after I restart serviceA pods again, I'm back to the beginning.

I don't see anything suspicious in envoy logs. Also, request to envoy's http://localhost:9901/certs does output cert info.

Could I be doing something wrong or is this I bug? Shall I create a separate issue for this and/or add more info?

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Sep 23, 2019

Hey @sviik, sounds strange indeed. Would you be willing to send me an email (celenzab at amazon dot com) with the following?

  1. Envoy container logs for service a and b (particularly service a before and after restart)
  2. Your AWS Account ID?

That should help me narrow it down. Meanwhile, I'll try and reproduce this with the same steps. Thanks!

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Sep 23, 2019

@sviik Thanks for sending along the logs and additional context, it was super helpful.

It appears in this case it's a bug on our side. Specifically, we're sometimes sending a secret (i.e. TLS validation context or TLS certificate) to Envoy before it has received the associated resource (in your case, the ingress Listener). When this happens, Envoy appropriately ignores the secret. Once the Listener is received, it sends a request for the new secret, but we're ignoring that request since we assume we've already sent it.

This is also why you saw it working when you removed TLS from serviceB, restarted the pod for serviceA, and re-added it: we sent the TLS certificate for serviceA only, then sent the TLS validation context for serviceB later when it had been re-added, and all the sequencing occurred in the correct order.

We'll work on getting this fixed and I'll report back here once it's available in preview. Given this is essentially a race condition between Envoy and our management server, restarting the pods may resolve the issue in some cases for now.

@joshuabaird

This comment has been minimized.

Copy link

@joshuabaird joshuabaird commented Sep 23, 2019

@bcelenza Do you have any idea when this will make it into the App Mesh SDK's (specifically for Ruby if that matters).

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Sep 23, 2019

@joshuabaird It will land in all SDKs with the same release, but we can't add it to the SDKs until we're ready to call it GA (at which point the API/SDK is final). A rough estimate will be within the next few weeks.

I take it you're looking to get started on an integration with the Ruby SDK before we launch? There is a way to get the new model in an existing SDK, you just need to replace the api model JSON.

For example, I have a forked version of the Go SDK and have a hacky Makefile target that creates a preview variant of App Mesh and uses the published preview model to build the client.

@bcelenza

This comment has been minimized.

Copy link
Contributor

@bcelenza bcelenza commented Dec 7, 2019

It's been a little while since we checked in on this issue. Sorry about that! We're going to start providing more regular updates on progress in individual issues.

We're in the final launch preparations for this feature. Specifically, we're load testing to ensure we can meet our scaling objectives, and that the correct behavior of Envoy is observed under these conditions.

The feature is still available in preview, and we're happy to receive any additional feedback, so please give it a try if you haven't yet.

What's coming after this? Here's a quick run-down of our priorities:

  1. This proposal enables certificates to be provided via filesystem references and alternate SDS APIs. It also introduces client policies, which will allow you to enable (enforce) TLS negotiation from the client side with optional validation against a CA.
  2. We're working on the design for #34. Expect an update there soon.
  3. We're in the research phase for #140 (just created). We've been talking to customers about this, and it seems like a flexible solution to authorization. Please +1 and comment on that issue if it fits your use cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
aws-app-mesh-roadmap
Available in Preview Channel
You can’t perform that action at this time.