aws_eks: Cluster creation with AlbControllerOptions is running into error #22005

mrlikl · 2022-09-12T15:44:22Z

Describe the bug

While creating an eks cluster with eks.AlbControllerOptions, it is running into error while creating the custom resource Custom::AWSCDK-EKS-HelmChart

"Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' "

Expected Behavior

Creation of the custom resource Custom::AWSCDK-EKS-HelmChart to be succesfull

Current Behavior

Custom::AWSCDK-EKS-HelmChart is running into error "Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress' "

Reproduction Steps

cluster = eks.Cluster(
scope=self,
id=construct_id,
tags={"env": "production"},
alb_controller=eks.AlbControllerOptions(
version=eks.AlbControllerVersion.V2_4_1
),
version=eks.KubernetesVersion.V1_21,
cluster_logging=[
eks.ClusterLoggingTypes.API,
eks.ClusterLoggingTypes.AUTHENTICATOR,
eks.ClusterLoggingTypes.SCHEDULER,
],
endpoint_access=eks.EndpointAccess.PUBLIC,
place_cluster_handler_in_vpc=True,
cluster_name="basking-k8s",
output_masters_role_arn=True,
output_cluster_name=True,
default_capacity=0,
kubectl_environment={"MINIMUM_IP_TARGET": "100", "WARM_IP_TARGET": "100"},
)

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.40.0

Framework Version

No response

Node.js Version

16.17.0

OS

macos 12.5.1

Language

Python

Language Version

3.10.6

Other information

No response

pahud · 2022-10-19T22:40:16Z

related to #19705

pahud · 2022-10-20T03:49:28Z

@mrlikl I was able to deploy it with cdk 2.46.0, kubernetes 1.21 and alb controller 2.4.1. Are you still having the issue?

mrlikl · 2022-10-21T07:46:03Z

Getting the same error when default_capacity=0, the code mentioned in the description will reproduce the error now.

pahud · 2022-11-18T21:33:18Z

@mrlikl I am running the following code to reproduce this error. Will let you know when the deploy completed.

import { KubectlV23Layer } from '@aws-cdk/lambda-layer-kubectl-v23';
import {
  App, Stack,
  aws_eks as eks,
  aws_ec2 as ec2,
} from 'aws-cdk-lib';

const devEnv = {
  account: process.env.CDK_DEFAULT_ACCOUNT,
  region: process.env.CDK_DEFAULT_REGION,
};

const app = new App();

const stack = new Stack(app, 'triage-dev5', { env: devEnv });

new eks.Cluster(stack, 'Cluster', {
  vpc: ec2.Vpc.fromLookup(stack, 'Vpc', { isDefault: true }),
  albController: {
    version: eks.AlbControllerVersion.V2_4_1,
  },
  version: eks.KubernetesVersion.V1_23,
  kubectlLayer: new KubectlV23Layer(stack, 'LayerVersion'),
  clusterLogging: [
    eks.ClusterLoggingTypes.API,
    eks.ClusterLoggingTypes.AUTHENTICATOR,
    eks.ClusterLoggingTypes.SCHEDULER,
  ],
  endpointAccess: eks.EndpointAccess.PUBLIC,
  placeClusterHandlerInVpc: true,
  clusterName: 'baking-k8s',
  outputClusterName: true,
  outputMastersRoleArn: true,
  defaultCapacity: 0,
  kubectlEnvironment: { MINIMUM_IP_TARGET: '100', WARM_IP_TARGET: '100' },
});

pahud · 2022-11-18T22:34:22Z

I am getting error with the CDK code provided above:

Lambda Log:

[ERROR] Exception: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress\n'
Traceback (most recent call last):
  File "/var/task/index.py", line 17, in handler
    return helm_handler(event, context)
  File "/var/task/helm/__init__.py", line 88, in helm_handler
    helm('upgrade', release, chart, repository, values_file, namespace, version, wait, timeout, create_namespace)
  File "/var/task/helm/__init__.py", line 186, in helm
    raise Exception(output)

I am making this a P2 now and I will investigate a little bit more on this next week. If you have any possible solution please let me know. Any pull request would be highly appreciated as well.

dimmyshu · 2022-12-30T08:11:47Z

I think this issue should be prioritized, a lot of other folks running into trouble when developing on sandbox.

I have seen a lot of issue in this repo which have setting default capacity 0 but did not realized it's a bug, It really impact development productivity since cloud formation template will take hours to rollback and cleanup the resource.

m17kea · 2023-01-09T17:23:38Z

I have the same issue:

CDK: 2.59
KubernetesVersion.V1_24,
AlbControllerVersion.V2_4_1

The error from CloudFormation is:

Received response status [FAILED] from custom resource. Message returned: Error: b'Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress\n' Logs: /aws/lambda/TestingStage-Release-awscdkawseksK-Handler886CB40B-KG9T55a3ZdwW at invokeUserFunction (/var/task/framework.js:2:6) at processTicksAndRejections (internal/process/task_queues.js:95:5) at async onEvent (/var/task/framework.js:1:365) at async Runtime.handler (/var/task/cfn-response.js:1:1543) (RequestId: 16bb84de-c183-4e1c-9e4e-cc7ec0efc5b8)

smislam · 2023-06-22T17:29:19Z

Hey @pahud. Thank you so much for looking into this.
Were you able to make any progress? I've been struggling on this for a while. Here is my latest stack Info:

    "aws-cdk-lib": "2.63.0",
    KubernetesVersion.V1_26
    AlbControllerVersion.V2_5_1

YikaiHu · 2023-07-17T06:41:28Z

Hi @pahud, still face the same issue.

I deployed the cdk in cn-north-1 region.

YikaiHu · 2023-07-17T08:49:13Z

Hi @pahud , I think I found out the root cause in my scenario. It may be caused by image can not be pulled in cn-north-1 region.

Please check:

Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": failed to resolve reference "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": pulling from host 602401143452.dkr.ecr.us-west-2.amazonaws.com failed with status code [manifests v2.4.1]: 401 Unauthorized

k logs aws-load-balancer-controller-75c785bc8c-72zpg -n kube-system

Error from server (BadRequest): container "aws-load-balancer-controller" in pod "aws-load-balancer-controller-75c785bc8c-72zpg" is waiting to start: trying and failing to pull image

kubectl describe pod aws-load-balancer-controller-75c785bc8c-72zpg -n kube-system

Name:                 aws-load-balancer-controller-75c785bc8c-72zpg
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      aws-load-balancer-controller
Node:                 ip-10-0-3-136.cn-north-1.compute.internal/10.0.3.136
Start Time:           Mon, 17 Jul 2023 16:30:59 +0800
Labels:               app.kubernetes.io/instance=aws-load-balancer-controller
                      app.kubernetes.io/name=aws-load-balancer-controller
                      pod-template-hash=75c785bc8c
Annotations:          kubernetes.io/psp: eks.privileged
                      prometheus.io/port: 8080
                      prometheus.io/scrape: true
Status:               Pending
IP:                   10.0.3.160
IPs:
  IP:           10.0.3.160
Controlled By:  ReplicaSet/aws-load-balancer-controller-75c785bc8c
Containers:
  aws-load-balancer-controller:
    Container ID:  
    Image:         602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
    Image ID:      
    Ports:         9443/TCP, 8080/TCP
    Host Ports:    0/TCP, 0/TCP
    Command:
      /controller
    Args:
      --cluster-name=Workshop-Cluster
      --ingress-class=alb
      --aws-region=cn-north-1
      --aws-vpc-id=vpc-0e4a9201452c76b0e
    State:          Waiting
      Reason:       ErrImagePull
    Ready:          False
    Restart Count:  0
    Liveness:       http-get http://:61779/healthz delay=30s timeout=10s period=10s #success=1 #failure=2
    Environment:
      AWS_STS_REGIONAL_ENDPOINTS:   regional
      AWS_DEFAULT_REGION:           cn-north-1
      AWS_REGION:                   cn-north-1
      AWS_ROLE_ARN:                 arn:aws-cn:iam::743271379588:role/clo-workshop-07-CLWorkshopEC2AndEKSeksClusterStack-1XO6CGEC91JGY
      AWS_WEB_IDENTITY_TOKEN_FILE:  /var/run/secrets/eks.amazonaws.com/serviceaccount/token
    Mounts:
      /tmp/k8s-webhook-server/serving-certs from cert (ro)
      /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jct6t (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  aws-iam-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  86400
  cert:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  aws-load-balancer-tls
    Optional:    false
  kube-api-access-jct6t:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  16m                 default-scheduler  Successfully assigned kube-system/aws-load-balancer-controller-75c785bc8c-72zpg to ip-10-0-3-136.cn-north-1.compute.internal
  Normal   Pulling    14m (x4 over 16m)   kubelet            Pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1"
  Warning  Failed     14m (x4 over 16m)   kubelet            Failed to pull image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": rpc error: code = Unknown desc = failed to pull and unpack image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": failed to resolve reference "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1": pulling from host 602401143452.dkr.ecr.us-west-2.amazonaws.com failed with status code [manifests v2.4.1]: 401 Unauthorized
  Warning  Failed     14m (x4 over 16m)   kubelet            Error: ErrImagePull
  Warning  Failed     14m (x6 over 16m)   kubelet            Error: ImagePullBackOff
  Normal   BackOff    87s (x62 over 16m)  kubelet            Back-off pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1"

YikaiHu · 2023-07-17T08:53:30Z

Seems like related to #22520

YikaiHu · 2023-07-17T08:59:42Z

013241004608.dkr.ecr.us-gov-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
151742754352.dkr.ecr.us-gov-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
558608220178.dkr.ecr.me-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
590381155156.dkr.ecr.eu-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-northeast-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-northeast-3.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-southeast-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ap-southeast-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.ca-central-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-central-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-north-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.eu-west-3.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.sa-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-east-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-west-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
800184023465.dkr.ecr.ap-east-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
877085696533.dkr.ecr.af-south-1.amazonaws.com/amazon/aws-load-balancer-controller:v2.4.1
918309763551.dkr.ecr.cn-north-1.amazonaws.com.cn/amazon/aws-load-balancer-controller:v2.4.1
961992271922.dkr.ecr.cn-northwest-1.amazonaws.com.cn/amazon/aws-load-balancer-controller:v2.4.1

Find a solution in kubernetes-sigs/aws-load-balancer-controller#1694, you can manually replace the ecr template url in cloudformation.

https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases?page=2

mrlikl · 2023-10-01T19:07:46Z

The issue is that when the cluster is deployed with default_capacity as 0 there will not be any nodes attached to it. While installing the aws-load-balancer-controller via helm, the status goes into pending-install, the pods will be pending as no nodes available to schedule pods. The handler lambda eventually times out after 15mins and the event handler lambda will retry the installation once again. The handler lambda executes helm upgrade and errors with Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress.

While this is expected as there are no nodes, I was testing by adding a check to kubectl-handler to see if nodes are 0 when the error is thrown and was able to handle the error. However, I am not sure if this is the right approach to solve this issue.

if b'another operation (install/upgrade/rollback) is in progress' in output:
                cmd_to_run = ["kubectl","get","nodes"]
                cmd_to_run.extend(['--kubeconfig', kubeconfig])
                get_nodes_output = subprocess.check_output(cmd_to_run, stderr=subprocess.STDOUT,cwd=outdir)
                if b'No resources found' in get_nodes_output:
                    return

Karatakos · 2023-10-16T05:51:58Z

@pahud out of interest is this still on the backlog or has it been deprioritized? Calling addnodegroupcapacity on the cluster doesn't work with defaultcapacity: 0 so it's not possible to use launch templates to control capacity via CDK -- as far as i've tested.

smislam · 2023-10-25T15:54:49Z

I have been Stuck on creating FargateCluster with this issue since 06/22 #22005 (comment) . Did the 'defaultCapacity' work for you? It is not an option for fargate.

Just tried with latest version of CDK today and still having this issue. It is possible to escalate this issue please?

PavanMudigondaTR · 2023-12-18T02:46:55Z

Could someone help me i have the same issue. Here is my repo https://github.com/PavanMudigondaTR/install-karpenter-with-cdk

pahud · 2023-12-18T16:29:14Z

It's been a while and I am now testing the following code in the latest CDK

export class EksStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props)

    // use my default VPC
    const vpc = getDefaultVpc(this);
    new eks.Cluster(this, 'Cluster', {
      vpc,
      albController: {
        version: eks.AlbControllerVersion.V2_6_2,
      },
      version: eks.KubernetesVersion.V1_27,
      kubectlLayer: new KubectlLayer(this, 'LayerVersion'),
      clusterLogging: [
        eks.ClusterLoggingTypes.API,
        eks.ClusterLoggingTypes.AUTHENTICATOR,
        eks.ClusterLoggingTypes.SCHEDULER,
      ],
      endpointAccess: eks.EndpointAccess.PUBLIC,
      placeClusterHandlerInVpc: true,
      clusterName: 'baking-k8s',
      outputClusterName: true,
      outputMastersRoleArn: true,
      defaultCapacity: 0,
      kubectlEnvironment: { MINIMUM_IP_TARGET: '100', WARM_IP_TARGET: '100' },
    });
  }
}

For issues from @mrlikl @Karatakos @smislam @PavanMudigondaTR, I am not sure if your issues are related to this one which seems to be related with AlbController, if it doesn't come with AlbController, please open a new issue and link to this one.

@YikaiHu EKS in China is a little bit more complicated, please open a separate issue for your case in China and link to this one. Thanks.

pahud · 2023-12-18T17:11:26Z

Unfortunately I can't deploy it with the following code in my first attempt.

I am making it a p1 for now and will simplify the code hopefully to figure out the root cause.

export class EksStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props)

    // use my default VPC
    const vpc = getDefaultVpc(this);
    new eks.Cluster(this, 'Cluster', {
      vpc,
      albController: {
        version: eks.AlbControllerVersion.V2_6_2,
      },
      version: eks.KubernetesVersion.V1_27,
      kubectlLayer: new KubectlLayer(this, 'LayerVersion'),
      clusterLogging: [
        eks.ClusterLoggingTypes.API,
        eks.ClusterLoggingTypes.AUTHENTICATOR,
        eks.ClusterLoggingTypes.SCHEDULER,
      ],
      endpointAccess: eks.EndpointAccess.PUBLIC,
      placeClusterHandlerInVpc: true,
      clusterName: 'baking-k8s',
      outputClusterName: true,
      outputMastersRoleArn: true,
      defaultCapacity: 0,
      kubectlEnvironment: { MINIMUM_IP_TARGET: '100', WARM_IP_TARGET: '100' },
    });
  }
}

github-actions · 2023-12-21T00:15:55Z

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

PavanMudigondaTR · 2023-12-21T04:46:24Z

issue still persists. please bot don't close the ticket

smislam · 2023-12-21T16:01:43Z

Hey @pahud, thank you so much for looking into this. I concur that the issue still persist. Here is the error:

Node: v20.10.0
Npm: 10.2.5
"aws-cdk-lib": "^2.115.0"
KubernetesVersion.V1_28
AlbControllerVersion.V2_6_2

EksClusterStack | 26/28 | 9:06:12 AM | CREATE_FAILED | Custom::AWSCDK-EKS-HelmChart | EksClusterS tackEksCluster922FB9AE-AlbController/Resource/Resource/Default (EksClusterStackEksCluster922FB9AEAlbContro ller1636C356) Received response status [FAILED] from custom resource. Message returned: Error: b'Release "aws-load-balancer-controller" does not exist. Installing it now.\nError: looks like "https://aws.github.io/eks-charts" is not a valid chart reposito ry or cannot be reached: Get "https://aws.github.io/eks-charts/index.yaml": dial tcp 185.199.110.153:443: connect: connection t imed out\n'

When I add your suggestion cluster.albController?.node.addDependency(cluster.defaultNodegroup!);, I get the following error:

$eks-cluster\node_modules\constructs\src\dependency.ts:91 const ret = (instance as any)[DEPENDABLE_SYMBOL]; ^ TypeError: Cannot read properties of undefined (reading 'Symbol(@aws-cdk/core.DependableTrait)')

smislam · 2023-12-21T20:23:41Z

@pahud, @mrlikl et. al,

I was able to resolve the issue. What I have found is that to create the egress controller, the code is getting helm files from Kubernetes sigs. To access those file, you must have egress enabled. In my case, I was creating my cluster in Private subnet. You need to create your cluster in a subnet with egress. SubnetType.PRIVATE_WITH_EGRESS.

Please update your Cluster and your VPC configurations to see if this gets resolved for you. My Stack completed successfully.

pahud · 2023-12-26T16:46:29Z

Thank you @smislam for the insights.

github-actions · 2023-12-28T20:04:34Z

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

andreprawira · 2023-12-29T02:24:38Z

@smislam SubnetType.PRIVATE_WITH_EGRESS causes RuntimeError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Deprecated_Isolated,Public

@pahud im still getting the same error with my python code even with default_capacity do you know where am i missing?

        vpc = ec2.Vpc.from_lookup(self, "VPCLookup", vpc_id=props.vpc_id)

        # provisioning a cluster
        cluster = eks.Cluster(
            self,
            "eks-cluster",
            version=eks.KubernetesVersion.V1_28,
            kubectl_layer=lambda_layer_kubectl_v28.KubectlV28Layer(self, "kubectl-layer"),
            cluster_name=f"{props.customer}-eks-cluster",
            default_capacity_instance=ec2.InstanceType("t3.medium"),
            default_capacity=2,
            alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_6_2),
            vpc=vpc,
            vpc_subnets=[ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED)],
            masters_role=iam.Role(self, "masters-role", assumed_by=iam.AccountRootPrincipal()),
        )

pahud · 2023-12-29T22:00:34Z

@andreprawira

For some reason it will fail if vpc_subnets selection is ec2.SubnetType.PRIVATE_ISOLATED as described in #22005 (comment).

RuntimeError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Deprecated_Isolated,Public

This means CDK doesn't seem to find any "private with egress" subnets in your vpc. Can you make sure you do have private subnets with egress(typically NAT gateway)?

smislam · 2023-12-29T22:21:39Z

@andreprawira, It looks like you are using a VPC (already created in another stack) that doesn't have a private subnet with egress. And, that is why you are getting that error.

vpc = ec2.Vpc.from_lookup(self, "VPCLookup", vpc_id=props.vpc_id)

You will not be able to use CDK to create your stack with such configuration for the reason I mentioned earlier in my comment.. So, either update with your VPC to create new private subnet with Egress or create an entirely new VPC with SubnetType.PRIVATE_WITH_EGRESS. This will require a NAT (either gateway or instance) as @pahud mentioned.

andreprawira · 2023-12-30T01:00:36Z

@pahud @smislam so we have a product in our service catalog that deploys VPC and IGW to all of our accounts and within that product, we dont use NAT GW, rather we use a TGW in our network account (meaning all traffic goes in and out through network account, even with the VPCs in various other accounts). That is why i did a VPC from lookup cause it has already been created.

That being said, is there another way for me to use the alb_controller with the VPC, TGW, and IGW are already set up as is? Btw, i hope i am not misunderstanding you guys when you said i cant use ec2.SubnetType.PRIVATE_ISOLATED because if i look at my cluster, i can see the subnets that it uses are all private subnets (the route tables for those subnets route the traffic to TGW that exists in network account, and the RT of those subnets dont route the traffic to IGW)

Furthermore, using vpc_subnets=[ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS)] causes RuntimeError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Deprecated_Isolated,Public and to answer your question @pahud i could be wrong but i dont think i have private subnets with egress if it uses NAT GW, but i have a TGW, shouldnt it worked as well?

How do i use ec2.SubnetType.PRIVATE_WITH_EGRESS)] with a TGW instead of NAT GW?

smislam · 2023-12-30T02:25:00Z

@andreprawira, Your setup should work. There is a bug in the older version of CDK that has an issue with Transit Gateway. I ran into this a while back. Any chance you are using older version of CDK?
Can you please try with latest version?

andreprawira · 2023-12-30T14:31:52Z

@smislam i just updated my cdk from version 2.115.0 to 2.117.0 and below is my code

vpc = ec2.Vpc.from_lookup(self, "VPCLookup", vpc_id=props.vpc_id)

        # provisioning a cluster
        cluster = eks.Cluster(
            self,
            "eks-cluster",
            version=eks.KubernetesVersion.V1_28,
            kubectl_layer=lambda_layer_kubectl_v28.KubectlV28Layer(self, "kubectl-layer"),
            # place_cluster_handler_in_vpc=True,
            cluster_name=f"{props.customer}-eks-cluster",
            default_capacity_instance=ec2.InstanceType("t3.medium"),
            default_capacity=2,
            alb_controller=eks.AlbControllerOptions(version=eks.AlbControllerVersion.V2_6_2),
            vpc=vpc,
            vpc_subnets=[ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS)],
            # masters_role=iam.Role(self, "masters-role", assumed_by=iam.AccountRootPrincipal()),
        )

but i am still getting the same RuntimeError: There are no 'Private' subnet groups in this VPC. Available types: Isolated,Deprecated_Isolated,Public

smislam · 2023-12-30T17:28:32Z

That is strange. I am not sure what is happening @andreprawira. We will need @pahud and the AWS CDK team to look deeper into this. Happy coding and a happy New Year!

pahud · 2024-01-02T15:23:54Z

@andreprawira

I think you still can use private isolated for the vpc_subnets as below:

vpc_subnets=[ec2.SubnetSelection(subnet_type=ec2.SubnetType.PRIVATE_ISOLATED)],

But if you look at the synthesized template, there could be a chance

Your lambda function for kubectl handler is associated with isolated subnets, which means:
a. your kubectl lambda handler may not be able to access the aws eks API endpoint through public internet unless the isolated subnets has relevant vpc endpoints enabled.
b. your kubectl lambda handler may not be able to access the cluster endpoint if it's public only
Your nodegroup may be deployed in the isolates subnets which may not be able to pull images from ECR public unless relevant vpc endpoint or proxy configuration is well configured.

Technically, it is possible to deploy eks cluster with isolated subnets but there're a lot of requirements you need to consider and we don't have a working sample for now and we will need more feedback from the community before we know how to do that and add it in the document.

We have a p1 tracking issue for eks cluster with isolated support at #12171 - we will need to close that first but that should not relevant to albcontroller.

github-actions · 2024-01-05T00:16:06Z

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

mrlikl added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 12, 2022

github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Sep 12, 2022

github-actions bot assigned otaviomacedo Sep 12, 2022

mrlikl changed the title ~~eks: Cluster creation with AlbControllerOptions is running into error~~ aws_eks: Cluster creation with AlbControllerOptions is running into error Sep 12, 2022

pahud added investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Nov 18, 2022

pahud assigned pahud and unassigned otaviomacedo Nov 18, 2022

pahud added the p2 label Nov 18, 2022

pahud added p1 and removed p2 labels Dec 18, 2023

github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Dec 21, 2023

github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Dec 21, 2023

pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 26, 2023

github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Dec 28, 2023

github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. labels Dec 29, 2023

pahud mentioned this issue Dec 29, 2023

AWS EKS: ALB Controller another operation error #28514

Closed

pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 29, 2023

github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Dec 30, 2023

pahud added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Jan 4, 2024

github-actions bot closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_eks: Cluster creation with AlbControllerOptions is running into error #22005

aws_eks: Cluster creation with AlbControllerOptions is running into error #22005

mrlikl commented Sep 12, 2022 •

edited

Loading

pahud commented Oct 19, 2022

pahud commented Oct 20, 2022

mrlikl commented Oct 21, 2022

pahud commented Nov 18, 2022

pahud commented Nov 18, 2022

dimmyshu commented Dec 30, 2022 •

edited

Loading

m17kea commented Jan 9, 2023

smislam commented Jun 22, 2023 •

edited

Loading

YikaiHu commented Jul 17, 2023 •

edited

Loading

YikaiHu commented Jul 17, 2023 •

edited

Loading

YikaiHu commented Jul 17, 2023

YikaiHu commented Jul 17, 2023

mrlikl commented Oct 1, 2023 •

edited

Loading

Karatakos commented Oct 16, 2023

smislam commented Oct 25, 2023

PavanMudigondaTR commented Dec 18, 2023

pahud commented Dec 18, 2023

pahud commented Dec 18, 2023

github-actions bot commented Dec 21, 2023

PavanMudigondaTR commented Dec 21, 2023

smislam commented Dec 21, 2023

smislam commented Dec 21, 2023

pahud commented Dec 26, 2023

github-actions bot commented Dec 28, 2023

andreprawira commented Dec 29, 2023

pahud commented Dec 29, 2023 •

edited

Loading

smislam commented Dec 29, 2023

andreprawira commented Dec 30, 2023 •

edited

Loading

smislam commented Dec 30, 2023

andreprawira commented Dec 30, 2023

smislam commented Dec 30, 2023

pahud commented Jan 2, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024

aws_eks: Cluster creation with AlbControllerOptions is running into error #22005

aws_eks: Cluster creation with AlbControllerOptions is running into error #22005

Comments

mrlikl commented Sep 12, 2022 • edited Loading

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

pahud commented Oct 19, 2022

pahud commented Oct 20, 2022

mrlikl commented Oct 21, 2022

pahud commented Nov 18, 2022

pahud commented Nov 18, 2022

dimmyshu commented Dec 30, 2022 • edited Loading

m17kea commented Jan 9, 2023

smislam commented Jun 22, 2023 • edited Loading

YikaiHu commented Jul 17, 2023 • edited Loading

YikaiHu commented Jul 17, 2023 • edited Loading

YikaiHu commented Jul 17, 2023

YikaiHu commented Jul 17, 2023

mrlikl commented Oct 1, 2023 • edited Loading

Karatakos commented Oct 16, 2023

smislam commented Oct 25, 2023

PavanMudigondaTR commented Dec 18, 2023

pahud commented Dec 18, 2023

pahud commented Dec 18, 2023

github-actions bot commented Dec 21, 2023

PavanMudigondaTR commented Dec 21, 2023

smislam commented Dec 21, 2023

smislam commented Dec 21, 2023

pahud commented Dec 26, 2023

github-actions bot commented Dec 28, 2023

andreprawira commented Dec 29, 2023

pahud commented Dec 29, 2023 • edited Loading

smislam commented Dec 29, 2023

andreprawira commented Dec 30, 2023 • edited Loading

smislam commented Dec 30, 2023

andreprawira commented Dec 30, 2023

smislam commented Dec 30, 2023

pahud commented Jan 2, 2024 • edited Loading

github-actions bot commented Jan 5, 2024

mrlikl commented Sep 12, 2022 •

edited

Loading

dimmyshu commented Dec 30, 2022 •

edited

Loading

smislam commented Jun 22, 2023 •

edited

Loading

YikaiHu commented Jul 17, 2023 •

edited

Loading

YikaiHu commented Jul 17, 2023 •

edited

Loading

mrlikl commented Oct 1, 2023 •

edited

Loading

pahud commented Dec 29, 2023 •

edited

Loading

andreprawira commented Dec 30, 2023 •

edited

Loading

pahud commented Jan 2, 2024 •

edited

Loading