Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter with Amazon EKS private cluster without outbound internet access #1157

Closed
gowthams316 opened this issue Jan 17, 2022 · 3 comments
Closed
Labels
documentation Improvements or additions to documentation help-wanted Extra attention is needed

Comments

@gowthams316
Copy link

gowthams316 commented Jan 17, 2022

Is an existing page relevant?

NA

What karpenter features are relevant?

Karpenter Node scaling when using completely Private Cluster or Private Subnets with no Internet access (No IGW and No NAT GW).

How should the docs be improved?
When creating an EKS Cluster with subnets having no route to internet, it is recommended to make sure the requirements for the same is met per EKS documentation https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#private-cluster-requirements . However, it is required to make sure the STS VPC regional endpoint in your VPC is also created if not would see errors as below.

2022-01-17T07:35:04.112Z  ERROR controller.controller.metrics Reconciler error {"commit": "5047f3c", "reconciler group": "karpenter.sh", "reconciler kind": "Provisioner", "name": "on-demand-az", "namespace": "", "error": "fetching instance types using ec2.DescribeInstanceTypes, WebIdentityErr: failed to retrieve credentials\ncaused by: RequestError: send request failed\ncaused by: Post \"https://sts.ap-south-1.amazonaws.com/\": dial tcp 52.95.84.35:443: i/o timeout"}

The recommendation of the same is because Karpenter Controller IAM role uses IRSA per https://karpenter.sh/docs/getting-started/#create-the-karpentercontroller-iam-role . Pods configured with IAM roles for service accounts acquire credentials from an AWS Security Token Service (AWS STS) API call. If there is no outbound internet access, you must create and use an AWS STS VPC endpoint in your VPC.

Additionally, along with the required VPC endpoints for EKS Private clusters https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#vpc-endpoints-private-clusters , for Karpenter it is required to create VPC endpoint for SSM, this is because as and when Karpenter tries to Launch an node it would query the Launch template configs and SSM parameter. If we do not have a SSM VPC endpoint in your VPC and it would through below error.

2022-01-17T07:49:09.992Z        INFO    controller.provisioning Waiting for unschedulable pods  {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:49:11.993Z        INFO    controller.provisioning Batched 3 pods in 1.000572709s  {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:49:12.058Z        INFO    controller.provisioning Computed packing of 1 node(s) for 3 pod(s) with instance type option(s) [c4.xlarge c6i.xlarge c5.xlarge c5d.xlarge c5a.xlarge c5n.xlarge m6i.xlarge m4.xlarge m6a.xlarge m5ad.xlarge m5d.xlarge t3.xlarge m5a.xlarge t3a.xlarge m5.xlarge r4.xlarge r3.xlarge r5ad.xlarge r6i.xlarge r5a.xlarge]        {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:51:12.445Z        ERROR   controller.provisioning Could not launch node, launching instances, getting launch template configs, getting launch templates, getting ssm parameter, RequestError: send request failed
caused by: Post "https://ssm.ap-south-1.amazonaws.com/": dial tcp 52.95.89.90:443: i/o timeout  {"commit": "5047f3c", "provisioner": "default"}

So, the required VPC endpoints for a completely Private EKS Clusters are as below.

com.amazonaws.<region>.ec2
com.amazonaws.<region>.ecr.api
com.amazonaws.<region>.ecr.dkr
com.amazonaws.<region>.s3 – For pulling container images
com.amazonaws.<region>.sts – For IAM roles for service accounts
com.amazonaws.<region>.ssm - If using Karpenter 

On adding the above VPC endpoints, could see Karpenter Controller, launch nodes for unschedulable Pods.

2022-01-17T07:53:14.998Z        INFO    controller.provisioning Waiting for unschedulable pods  {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:53:17.009Z        INFO    controller.provisioning Batched 3 pods in 1.01010579s   {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:53:17.638Z        INFO    controller.provisioning Computed packing of 1 node(s) for 3 pod(s) with instance type option(s) [c4.xlarge c5d.xlarge c6i.xlarge c5a.xlarge c5.xlarge c5n.xlarge m5d.xlarge t3.xlarge m4.xlarge m5.xlarge m6a.xlarge t3a.xlarge m6i.xlarge m5ad.xlarge m5a.xlarge r3.xlarge r4.xlarge r6i.xlarge r5.xlarge r5ad.xlarge] {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:54:20.740Z        INFO    controller.provisioning Launched instance: i-01e548d243ece1782, hostname: ip-192-168-189-164.ap-south-1.compute.internal, type: c6i.xlarge, zone: ap-south-1c, capacityType: spot       {"commit": "5047f3c", "provisioner": "default"}
2022-01-17T07:54:20.784Z        INFO    controller.provisioning Bound 3 pod(s) to node ip-192-168-189-164.ap-south-1.compute.internal   {"commit": "5047f3c", "provisioner": "default"}

It would be helpful if we can have this updated in the Karpenter documentation for above mentioned use case i.e Use of Karpenter with Private EKS Clusters.

Additional note : Also, Karpenter container image must be in or copied to Amazon ECR or to a registry inside the VPC to be pulled per https://docs.aws.amazon.com/eks/latest/userguide/private-clusters.html#container-images , as Karpenter controller and webhook pods currently uses Public ECR and you would see below error if local copies of images are not created.

Warning  Failed     17m (x3 over 19m)     kubelet            Failed to pull image "public.ecr.aws/karpenter/controller:v0.5.3@sha256:ddd24d756cb324cf8f91f2274621646f83d6121ed6856312ca672a5f78c57174": rpc error: code = Unknown desc = Error response from daemon: Get "https://public.ecr.aws/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@ellistarn ellistarn added the documentation Improvements or additions to documentation label Jan 17, 2022
@github-actions
Copy link
Contributor

This issue is stale because it has been open 25 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Feb 22, 2022
@ellistarn ellistarn added help-wanted Extra attention is needed and removed Stale labels Feb 22, 2022
@k4kratik
Copy link

k4kratik commented Sep 2, 2022

@billrayburn
Copy link
Contributor

Closing because issue is old, please reopen if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help-wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants