You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Karpenter Node scaling when using completely Private Cluster or Private Subnets with no Internet access (No IGW and No NAT GW).
How should the docs be improved?
When creating an EKS Cluster with subnets having no route to internet, it is recommended to make sure the requirements for the same is met per EKS documentation https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#private-cluster-requirements . However, it is required to make sure the STS VPC regional endpoint in your VPC is also created if not would see errors as below.
The recommendation of the same is because Karpenter Controller IAM role uses IRSA per https://karpenter.sh/docs/getting-started/#create-the-karpentercontroller-iam-role . Pods configured with IAM roles for service accounts acquire credentials from an AWS Security Token Service (AWS STS) API call. If there is no outbound internet access, you must create and use an AWS STS VPC endpoint in your VPC.
Additionally, along with the required VPC endpoints for EKS Private clusters https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#vpc-endpoints-private-clusters , for Karpenter it is required to create VPC endpoint for SSM, this is because as and when Karpenter tries to Launch an node it would query the Launch template configs and SSM parameter. If we do not have a SSM VPC endpoint in your VPC and it would through below error.
2022-01-17T07:49:09.992Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:49:11.993Z INFO controller.provisioning Batched 3 pods in 1.000572709s {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:49:12.058Z INFO controller.provisioning Computed packing of 1 node(s) for 3 pod(s) with instance type option(s) [c4.xlarge c6i.xlarge c5.xlarge c5d.xlarge c5a.xlarge c5n.xlarge m6i.xlarge m4.xlarge m6a.xlarge m5ad.xlarge m5d.xlarge t3.xlarge m5a.xlarge t3a.xlarge m5.xlarge r4.xlarge r3.xlarge r5ad.xlarge r6i.xlarge r5a.xlarge] {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:51:12.445Z ERROR controller.provisioning Could not launch node, launching instances, getting launch template configs, getting launch templates, getting ssm parameter, RequestError: send request failed
caused by: Post "https://ssm.ap-south-1.amazonaws.com/": dial tcp 52.95.89.90:443: i/o timeout {"commit": "5047f3c", "provisioner": "default"}
So, the required VPC endpoints for a completely Private EKS Clusters are as below.
com.amazonaws.<region>.ec2
com.amazonaws.<region>.ecr.api
com.amazonaws.<region>.ecr.dkr
com.amazonaws.<region>.s3 – For pulling container images
com.amazonaws.<region>.sts – For IAM roles for service accounts
com.amazonaws.<region>.ssm - If using Karpenter
On adding the above VPC endpoints, could see Karpenter Controller, launch nodes for unschedulable Pods.
2022-01-17T07:53:14.998Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:53:17.009Z INFO controller.provisioning Batched 3 pods in 1.01010579s {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:53:17.638Z INFO controller.provisioning Computed packing of 1 node(s) for 3 pod(s) with instance type option(s) [c4.xlarge c5d.xlarge c6i.xlarge c5a.xlarge c5.xlarge c5n.xlarge m5d.xlarge t3.xlarge m4.xlarge m5.xlarge m6a.xlarge t3a.xlarge m6i.xlarge m5ad.xlarge m5a.xlarge r3.xlarge r4.xlarge r6i.xlarge r5.xlarge r5ad.xlarge] {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:54:20.740Z INFO controller.provisioning Launched instance: i-01e548d243ece1782, hostname: ip-192-168-189-164.ap-south-1.compute.internal, type: c6i.xlarge, zone: ap-south-1c, capacityType: spot {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:54:20.784Z INFO controller.provisioning Bound 3 pod(s) to node ip-192-168-189-164.ap-south-1.compute.internal {"commit": "5047f3c", "provisioner": "default"}
It would be helpful if we can have this updated in the Karpenter documentation for above mentioned use case i.e Use of Karpenter with Private EKS Clusters.
Additional note : Also, Karpenter container image must be in or copied to Amazon ECR or to a registry inside the VPC to be pulled per https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#container-images , as Karpenter controller and webhook pods currently uses Public ECR and you would see below error if local copies of images are not created.
Warning Failed 17m (x3 over 19m) kubelet Failed to pull image "public.ecr.aws/karpenter/controller:v0.5.3@sha256:ddd24d756cb324cf8f91f2274621646f83d6121ed6856312ca672a5f78c57174": rpc error: code = Unknown desc = Error response from daemon: Get "https://public.ecr.aws/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Community Note
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
The text was updated successfully, but these errors were encountered:
Is an existing page relevant?
NA
What karpenter features are relevant?
Karpenter Node scaling when using completely Private Cluster or Private Subnets with no Internet access (No IGW and No NAT GW).
How should the docs be improved?
When creating an EKS Cluster with subnets having no route to internet, it is recommended to make sure the requirements for the same is met per EKS documentation https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#private-cluster-requirements . However, it is required to make sure the STS VPC regional endpoint in your VPC is also created if not would see errors as below.
The recommendation of the same is because Karpenter Controller IAM role uses IRSA per https://karpenter.sh/docs/getting-started/#create-the-karpentercontroller-iam-role . Pods configured with IAM roles for service accounts acquire credentials from an AWS Security Token Service (AWS STS) API call. If there is no outbound internet access, you must create and use an AWS STS VPC endpoint in your VPC.
Additionally, along with the required VPC endpoints for EKS Private clusters https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#vpc-endpoints-private-clusters , for Karpenter it is required to create VPC endpoint for SSM, this is because as and when Karpenter tries to Launch an node it would query the Launch template configs and SSM parameter. If we do not have a SSM VPC endpoint in your VPC and it would through below error.
So, the required VPC endpoints for a completely Private EKS Clusters are as below.
On adding the above VPC endpoints, could see Karpenter Controller, launch nodes for unschedulable Pods.
It would be helpful if we can have this updated in the Karpenter documentation for above mentioned use case i.e Use of Karpenter with Private EKS Clusters.
Additional note : Also, Karpenter container image must be in or copied to Amazon ECR or to a registry inside the VPC to be pulled per https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#container-images , as Karpenter controller and webhook pods currently uses Public ECR and you would see below error if local copies of images are not created.
Community Note
The text was updated successfully, but these errors were encountered: