Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upEnd to end encryption of traffic with ACM managed certs #39
Comments
This comment has been minimized.
This comment has been minimized.
This is very interesting indeed! I was planning to use Istio to enable end-to-end encryption between microservices, so that we have no chances to connect wrong services due to reused VPC IPs. But maintaining Istio CA/Auth/Citadel and the other parts of Istio's control-plane and its foundation, K8S cluster, just for a service mesh seemed an overkill. I'd expect App Mesh integrated with ACM to provide the same benefit, without the operational burden. |
This comment has been minimized.
This comment has been minimized.
This is the biggest blocker for us moving from traditional serviceA → ALB → serviceB approach to serviceA → AppMesh → serviceB — we always want inter-service requests to use HTTPS with ACM-issued certificates. But, I can't see how this will be possible in the current model; the App Mesh architecture has the Envoy proxy running under our (the AWS user) control as a sidecar container rather than under AWS' control (as is the case with ALB etc). ACM must not hand the certificate/privkey over to Envoy. So App Mesh would need to introduce another layer of proxy, or perhaps integrate ALB to terminate the HTTPS/TLS. I'm very keen to hear more about how this might work. |
This comment has been minimized.
This comment has been minimized.
@pda I would like to understand more about the concern around "ACM must not hand the certificate/privkey over to Envoy". I am assuming this is in the context of TLS termination on the Envoy for incoming traffic on the service endpoint. If the cert pair is specific to the service then what is the concern in giving the secrets to Envoy that is going to terminate TLS? |
This comment has been minimized.
This comment has been minimized.
@kiranmeduri Thanks for the reply. When I say “ACM must not …” I'm referring to my understanding that ACM by design always keeps certificates/secrets in AWS-managed context where the AWS customer cannot access it. e.g. there's no way a customer can retrieve a certificate private key attached to an ALB. Whereas App Mesh has envoy running in customer-managed space. If App Mesh / ACM passes the certificate & private key to the envoy proxy, wouldn't it be possible for the AWS customer to access/exfiltrate it? It's quite likely I'm misunderstanding some aspect of this. |
This comment has been minimized.
This comment has been minimized.
@pda You are correct for ACM certificates that are publicly verifiable: the private key cannot be retrieved. For private certificates (from ACM PCA), the private key can be retrieved on behalf of the customer through a secure channel. Private certificates would be useful for service-to-service communication (and mTLS) within a VPC or other private network. I'm currently researching a number of scenarios we'd like to support, both public and private, and will follow-up here once I have a good handle on what we're proposing. |
This comment has been minimized.
This comment has been minimized.
Thanks for the clarification @bcelenza — I look forward to hearing more on this front |
This comment has been minimized.
This comment has been minimized.
Any updates here? |
This comment has been minimized.
This comment has been minimized.
@tom-schultz I'm currently talking with the ACM team and working on a design proposal for this feature. I'll have an update here soon. In the meantime, I'm looking for more input from anyone willing to take the time. Here are some questions I have. Feel free to answer any that pertain to you. And of course, if I've missed something you feel needs mention, I'd be happy to hear that as well. A big thanks in advance for anyone who takes the time to provide additional input here. Questions
I'm also curious, for any customers who use ACM PCA today, do you always use ACM-validated domains, or do you use the |
This comment has been minimized.
This comment has been minimized.
For the curiosity question, I mostly use ACM validated domains, but I have one scenario where I need an internal domain hosted on our intranet, which I use the internal PKI to issue it (DC applications making calls to my VPC hosted app). This feature right here is what prevents me from using App Mesh. I need E2E encryption and I am making do with NLB and Vault issued certs, I would love to drop Vault.
|
This comment has been minimized.
This comment has been minimized.
Questions
A single per mesh I guess. We have multiple products, each using their own mesh and CA. But these are separate by AWS accounts so fine for us to have a 1:1 map between PCA and mesh.
No special requirements, at least following common security guidelines (1-2 years). However, we prefer automatic issuance at which point you can rotate much more regularly.
Ideally yes. We'd prefer it to interface with ACM; similar to CloudFront distributions (create new / reuse existing). Apart from that we'd love to have authentication too, if possible. |
This comment has been minimized.
This comment has been minimized.
It would be cool if this could integrate with the AWS Private CA possibly. |
This comment has been minimized.
This comment has been minimized.
App Mesh will soon be adding support for enabling TLS between services in the mesh. This first pass will allow you to provide a certificate directly from AWS Certificate Manager (ACM) and enable TLS for a given VirtualNode listener. VirtualNodes that act as downstream clients of a TLS-enabled VirtualNode will automatically receive the appropriate validation context to validate the certificate you provide. With this change, you will be able to use the following options to secure traffic between services:
Please note that at this time you cannot use a public certificate provided by ACM. To enable TLS with a private or imported certificate, we're proposing the following API settings on the VirtualNode listener.
These changes will enable TLS between services with the use of a server certificate. Please note that client certificates for mTLS are covered in separate roadmap items (#34, #68). Let us know if these changes fit your service traffic encryption use cases, and if not, what else you'd like to see. |
This comment has been minimized.
This comment has been minimized.
That would be perfect @bcelenza |
This comment has been minimized.
This comment has been minimized.
Heads up! For customers who will be using the ACM integration with App Mesh, you will need to update the IAM policy associated with the Envoy Proxy connecting to App Mesh's Envoy Management Service. See #80 for details. |
This comment has been minimized.
This comment has been minimized.
Hey all, this is ready for trial in our preview environment. Check out the walkthrough to get started using TLS w/ ACM in App Mesh, and the docs for more info. Let us know what you think! |
This comment has been minimized.
This comment has been minimized.
@bcelenza I noticed that it's recommended that new virtual nodes are created with TLS enabled. Can the TLS configuration not be applied to existing virtual nodes? If not, can you shed some light on why new virtual nodes are required? Hopeful that this will be possible once this feature goes GA. The implementation of App Mesh has been rather time consuming for us, due to the AWS requirements that certain pieces of infrastructure need to be completely re-created (and not updated) such as enabling service discovery on existing ECS services, changing the type of target-group to "ip" (needed for awsvpc enablement), etc. |
This comment has been minimized.
This comment has been minimized.
We have identified some potential issues with this feature and therefore disabling it until we investigate further. We will keep you updated about the progress. Apologies! |
This comment has been minimized.
This comment has been minimized.
@joshuabaird The only reason we recommend creating a new virtual node is so you can carefully traffic shift from plaintext to TLS by way of a route. You can definitely apply certificates to existing nodes, but there is an eventual consistency concern between when the Envoy terminating TLS recieves the update, and the clients originating TLS receive validation context, that could cause traffic to fail for a short duration (seconds). |
This comment has been minimized.
This comment has been minimized.
Just to check, does this mean if you have a private certificate, it is not recommended to use App Mesh? Are you guys gonna work on it in the future or it's not on the roadmap in the near future? Thanks! |
This comment has been minimized.
This comment has been minimized.
@oceaneLonneux Specifically it means if you have a public certificate it is not recommended to use with App Mesh and Envoy. Instead we would recommend you terminate TLS with a Network, Application, or Classic load balancer using the public certificate, then re-encrypt the traffic to the Envoy Proxy using a private certificate. Public TLS termination is in scope for the future w/ App Mesh and Envoy. Hope that helps! |
This comment has been minimized.
This comment has been minimized.
Hey all, Shortly after launch into the preview channel, we found a bug that we did not feel comfortable exposing to customers, and temporarily disabled access to the feature. Since posting the initial launch announcement and subsequent bug finding, we’ve been working to find a way to resolve the discovered issue. In the coming weeks we’ll have a fix and will re-release the feature in the App Mesh preview channel. However, upon re-release App Mesh will only support certificates from ACM which have been issued by an ACM Private Certificate Authority. We’ll follow up when we have a new plan for using imported certificates from ACM with App Mesh. We’d love to receive more feedback on how you rank the following certificate types from ACM with your own service mesh use cases:
|
This comment has been minimized.
This comment has been minimized.
+1 for 3. Imported certificates stored by ACM |
This comment has been minimized.
This comment has been minimized.
+1 for 3. Option 2 will be also great if we can use the Private Certificate Authority without any additional cost or with much lower costs. Thanks for your great work and I hope you could find a better way soon. |
This comment has been minimized.
This comment has been minimized.
ACM issued by ACM Private Certificate Authority is good for me |
This comment has been minimized.
This comment has been minimized.
This feature has been re-enabled in our preview environment and is ready for testing. See the updated walkthrough for the latest information on how it works. We're still working on enabling imported certificates with ACM, and will follow up here when we have more news. Looking forward to your feedback! |
This comment has been minimized.
This comment has been minimized.
@bcelenza How does Envoy know to trust the private CA generated by PCA? Is Envoy simply configured not to validate the CA certificate? |
This comment has been minimized.
This comment has been minimized.
@joshuabaird App Mesh will automatically distribute the appropriate validation chain. Keep in mind that this is simply to support encrypting the traffic, and does not intend to serve as authenticating the certificate in any way. We will be adding explicit TLS validation context for Virtual Node backends in the near future, which will allow you to set which CA you want to use to validate a given backend's certificates. Hopefully that makes sense. |
This comment has been minimized.
This comment has been minimized.
Context: using EKS. I have somewhat strange problem with this feature. Namely, TLS works fine only when just one VirtualNode has configured to use TLS. When adding TLS to another VirtualNode, requests start failing. Let's say I have serviceA and serviceB. VirtualNodes of both services are configured to use TLS (PERMISSIVE or STRICT mode). However, when I curl serviceA (both http or https), the requests fail ( Now, when I
then requests (both http and https) to both services work. But after I restart serviceA pods again, I'm back to the beginning. I don't see anything suspicious in envoy logs. Also, request to envoy's Could I be doing something wrong or is this I bug? Shall I create a separate issue for this and/or add more info? |
This comment has been minimized.
This comment has been minimized.
Hey @sviik, sounds strange indeed. Would you be willing to send me an email (celenzab at amazon dot com) with the following?
That should help me narrow it down. Meanwhile, I'll try and reproduce this with the same steps. Thanks! |
This comment has been minimized.
This comment has been minimized.
@sviik Thanks for sending along the logs and additional context, it was super helpful. It appears in this case it's a bug on our side. Specifically, we're sometimes sending a secret (i.e. TLS validation context or TLS certificate) to Envoy before it has received the associated resource (in your case, the ingress Listener). When this happens, Envoy appropriately ignores the secret. Once the Listener is received, it sends a request for the new secret, but we're ignoring that request since we assume we've already sent it. This is also why you saw it working when you removed TLS from serviceB, restarted the pod for serviceA, and re-added it: we sent the TLS certificate for serviceA only, then sent the TLS validation context for serviceB later when it had been re-added, and all the sequencing occurred in the correct order. We'll work on getting this fixed and I'll report back here once it's available in preview. Given this is essentially a race condition between Envoy and our management server, restarting the pods may resolve the issue in some cases for now. |
This comment has been minimized.
This comment has been minimized.
@bcelenza Do you have any idea when this will make it into the App Mesh SDK's (specifically for Ruby if that matters). |
This comment has been minimized.
This comment has been minimized.
@joshuabaird It will land in all SDKs with the same release, but we can't add it to the SDKs until we're ready to call it GA (at which point the API/SDK is final). A rough estimate will be within the next few weeks. I take it you're looking to get started on an integration with the Ruby SDK before we launch? There is a way to get the new model in an existing SDK, you just need to replace the api model JSON. For example, I have a forked version of the Go SDK and have a hacky Makefile target that creates a preview variant of App Mesh and uses the published preview model to build the client. |
This comment has been minimized.
This comment has been minimized.
It's been a little while since we checked in on this issue. Sorry about that! We're going to start providing more regular updates on progress in individual issues. We're in the final launch preparations for this feature. Specifically, we're load testing to ensure we can meet our scaling objectives, and that the correct behavior of Envoy is observed under these conditions. The feature is still available in preview, and we're happy to receive any additional feedback, so please give it a try if you haven't yet. What's coming after this? Here's a quick run-down of our priorities:
|
No description provided.