Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws-eks: Cannot update cluster endpoint access #21439

Closed
resnikb opened this issue Aug 3, 2022 · 8 comments · Fixed by #22957
Closed

aws-eks: Cannot update cluster endpoint access #21439

resnikb opened this issue Aug 3, 2022 · 8 comments · Fixed by #22957
Assignees
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2

Comments

@resnikb
Copy link

resnikb commented Aug 3, 2022

Describe the bug

Changing endpoint access for an existing EKS cluster fails.

I have an existing EKS cluster with

endpointAccess: EndpointAccess.PUBLIC_AND_PRIVATE

If I change that to

endpointAccess: EndpointAccess.PRIVATE

CloudFormation update fails with error:

Received response status [FAILED] from custom resource. Message returned: Only one type of update can be allowed.

Expected Behavior

The update should succeed, and change the endpoint access to PRIVATE.

Current Behavior

Cluster update fails with:

Received response status [FAILED] from custom resource. Message returned: Only one type of update can be allowed.

Reproduction Steps

Create a new cluster with logging enabled:

   new Cluster(stack, 'Cluster', {
      clusterName: 'MyCluster',
      version: KubernetesVersion.V1_21,
      endpointAccess: EndpointAccess.PUBLIC_AND_PRIVATE,
      vpc,
      clusterLogging: [
        ClusterLoggingTypes.API,
        ClusterLoggingTypes.AUDIT,
        ClusterLoggingTypes.AUTHENTICATOR,
        ClusterLoggingTypes.CONTROLLER_MANAGER,
        ClusterLoggingTypes.SCHEDULER,
      ],
    });

Change endpoint access to EndpointAccess.PRIVATE and redeploy

Possible Solution

The cluster handler lambda specifies logging configuration even when only endpoint access needs to be updated. If logging configuration doesn't need updating, it should not be specified in the call to updateClusterConfig.

Additional Information/Context

CloudWatch logs for lambda execution:

2022-08-03T11:07:30.872Z	c627c94c-6ee5-4193-9d6b-18f1b376637a	INFO	onUpdate: {
    "updates": {
        "replaceName": false,
        "replaceVpc": false,
        "updateAccess": true,
        "replaceRole": false,
        "updateVersion": false,
        "updateEncryption": false,
        "updateLogging": false
    }
}

2022-08-03T11:07:31.798Z	c627c94c-6ee5-4193-9d6b-18f1b376637a	INFO	[AWS eks 400 0.925s 0 retries] updateClusterConfig({
  name: '<redacted>',
  logging: {
    clusterLogging: [
      {
        types: [
          'api',
          'audit',
          'authenticator',
          'controllerManager',
          'scheduler',
          [length]: 5
        ],
        enabled: true
      },
      { types: [ [length]: 0 ], enabled: true },
      [length]: 2
    ]
  },
  resourcesVpcConfig: {
    endpointPrivateAccess: true,
    endpointPublicAccess: false,
    publicAccessCidrs: undefined
  },
  clientRequestToken: '<redacted>'
})

2022-08-03T11:07:31.812Z	c627c94c-6ee5-4193-9d6b-18f1b376637a	ERROR	Invoke Error 	{
    "errorType": "InvalidParameterException",
    "errorMessage": "Only one type of update can be allowed.",
    "code": "InvalidParameterException",
    "message": "Only one type of update can be allowed.",
    "time": "2022-08-03T11:07:31.797Z",
    "requestId": "<redacted>",
    "statusCode": 400,
    "retryable": false,
    "retryDelay": 23.61820879655758,
    "stack": [
        "InvalidParameterException: Only one type of update can be allowed.",
        "    at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)",
        "    at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)",
        "    at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)",
        "    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)",
        "    at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)",
        "    at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)",
        "    at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)",
        "    at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10",
        "    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)",
        "    at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12)"
    ]
}

CDK CLI Version

2.35.0 (build 5c23578)

Framework Version

No response

Node.js Version

v16.16.0

OS

Linux

Language

Typescript

Language Version

No response

Other information

No response

@resnikb resnikb added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 3, 2022
@github-actions github-actions bot added the @aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service label Aug 3, 2022
@resnikb
Copy link
Author

resnikb commented Aug 3, 2022

On further inspection, this might be related to #21436 and the changes made in #21185, as the clusterLogging JSON in the call to updateClusterConfig is now invalid.

@juweeks
Copy link

juweeks commented Aug 9, 2022

we're getting this too now. v2.33

@johnnyhuy
Copy link

One way we've worked around it was to turn off logging on the first run, switch-over endpoint access and then turn on logging. Not ideal since it doesn't resolve the root issue because we're doing three deploys instead of one.

@pahud pahud added the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Nov 14, 2022
@pahud
Copy link
Contributor

pahud commented Nov 16, 2022

I can reproduce this issue in cdk v2.50.0 and I'm assigning this to myself as p2. I'll look into it for the root cause and investigate if there's anything we can do to fix it.

@pahud pahud self-assigned this Nov 16, 2022
@pahud pahud added p2 effort/medium Medium work item – several days of effort and removed needs-triage This issue or PR still needs to be triaged. labels Nov 16, 2022
@pahud
Copy link
Contributor

pahud commented Nov 16, 2022

I think we probably should fix here.

if (updates.updateLogging || updates.updateAccess) {
const config: aws.EKS.UpdateClusterConfigRequest = {
name: this.clusterName,
logging: this.newProps.logging,
};
if (updates.updateAccess) {
// Updating the cluster with securityGroupIds and subnetIds (as specified in the warning here:
// https://awscli.amazonaws.com/v2/documentation/api/latest/reference/eks/update-cluster-config.html)
// will fail, therefore we take only the access fields explicitly
config.resourcesVpcConfig = {
endpointPrivateAccess: this.newProps.resourcesVpcConfig.endpointPrivateAccess,
endpointPublicAccess: this.newProps.resourcesVpcConfig.endpointPublicAccess,
publicAccessCidrs: this.newProps.resourcesVpcConfig.publicAccessCidrs,
};
}
const updateResponse = await this.eks.updateClusterConfig(config);

According to the lambda logs:

{
    "updates": {
        "replaceName": false,
        "replaceVpc": false,
        "updateAccess": true,
        "replaceRole": false,
        "updateVersion": false,
        "updateEncryption": false,
        "updateLogging": false
    }
}

We actually need to updateAccess only, it's not clear to me why we should have this line:

@pahud
Copy link
Contributor

pahud commented Nov 17, 2022

Just created a PR draft for a quick fix #22957

I can successfully update the stack by simply updating the endpoint access like

   new eks.Cluster(this, 'Cluster', {
      vpc,
      endpointAccess: eks.EndpointAccess.PRIVATE,
      version: eks.KubernetesVersion.V1_23,
      clusterLogging: [
        eks.ClusterLoggingTypes.API,
        eks.ClusterLoggingTypes.AUDIT,
        eks.ClusterLoggingTypes.AUTHENTICATOR,
        eks.ClusterLoggingTypes.CONTROLLER_MANAGER,
        eks.ClusterLoggingTypes.SCHEDULER,
      ],
    });

Will look into previous commits to see if I miss anything.

@pahud pahud removed the investigating This issue is being investigated and/or work is in progress to resolve the issue. label Nov 17, 2022
@jaredhancock31
Copy link

jaredhancock31 commented Dec 5, 2022

We're seeing a similar problem when mutating the list of allowAccessFrom CIDRs.

Example:

  1. deploy a cluster with something like the following:
     # assume clusterLogging also enabled here
      allowAccessFrom:
        - 2.4.6.0/24
  1. Then, try to update it by adding 2 entries:
     # assume clusterLogging is still the same as before, no delta
      allowAccessFrom:
        - 2.4.6.0/24
        - 1.2.3.4/32 
        - 3.3.3.3/32
  1. Observe the following error:
4:59:47 PM | UPDATE_FAILED        | Custom::AWSCDK-EKS-Cluster            | EKSClusterE11008B6
Received response status [FAILED] from custom resource. Message returned: Only one type of update can be allowed.

Logs: /aws/lambda/redacted-name-awscdkawseks-OnEventHandler42BEBAE0-5fCPxU8lMELN

at Object.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/json.js:52:27)
at Request.extractError (/var/runtime/node_modules/aws-sdk/lib/protocol/rest_json.js:49:8)
at Request.callListeners (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
at Request.emit (/var/runtime/node_modules/aws-sdk/lib/request.js:686:14)
at Request.transition (/var/runtime/node_modules/aws-sdk/lib/request.js:22:10)
at AcceptorStateMachine.runTo (/var/runtime/node_modules/aws-sdk/lib/state_machine.js:14:12)
at /var/runtime/node_modules/aws-sdk/lib/state_machine.js:26:10
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:38:9)
at Request.<anonymous> (/var/runtime/node_modules/aws-sdk/lib/request.js:688:12) (RequestId: 7e00f099-29a7-4f0e-8bc9-4e9a7e552922)

This was tested using cdk 2.50.
This will block a lot of our automation from progressing

@mergify mergify bot closed this as completed in #22957 Dec 20, 2022
mergify bot pushed a commit that referenced this issue Dec 20, 2022
…#22957)

This PR addresses the following known issues:

1. When updating the cluster endpoint access type only with logging predefined yet unchanged, the cluster-resource-handler updates both the logging and access, which is not allowed and throws the SDK error. This PR fixed this and will update access type only, which is allowed.
2. When updating the cluster endpoint public cidr with exactly the same size of cidr, the `setsEqual` function should return correctly.
3. When updating the cluster endpoint public access from one cidr to multiple cidr with logging predefined yet unchanged, the update should return correctly.
4. Updating both access and logging now throws an error from CDK custom resource.

This PR is just a temporary fix that does not implement multiple operations in the cluster-resource-handler custom resource provider(i.e. update both logging and access). 

Fixes: #21439

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

brennanho pushed a commit to brennanho/aws-cdk that referenced this issue Jan 20, 2023
…aws#22957)

This PR addresses the following known issues:

1. When updating the cluster endpoint access type only with logging predefined yet unchanged, the cluster-resource-handler updates both the logging and access, which is not allowed and throws the SDK error. This PR fixed this and will update access type only, which is allowed.
2. When updating the cluster endpoint public cidr with exactly the same size of cidr, the `setsEqual` function should return correctly.
3. When updating the cluster endpoint public access from one cidr to multiple cidr with logging predefined yet unchanged, the update should return correctly.
4. Updating both access and logging now throws an error from CDK custom resource.

This PR is just a temporary fix that does not implement multiple operations in the cluster-resource-handler custom resource provider(i.e. update both logging and access). 

Fixes: aws#21439

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
brennanho pushed a commit to brennanho/aws-cdk that referenced this issue Feb 22, 2023
…aws#22957)

This PR addresses the following known issues:

1. When updating the cluster endpoint access type only with logging predefined yet unchanged, the cluster-resource-handler updates both the logging and access, which is not allowed and throws the SDK error. This PR fixed this and will update access type only, which is allowed.
2. When updating the cluster endpoint public cidr with exactly the same size of cidr, the `setsEqual` function should return correctly.
3. When updating the cluster endpoint public access from one cidr to multiple cidr with logging predefined yet unchanged, the update should return correctly.
4. Updating both access and logging now throws an error from CDK custom resource.

This PR is just a temporary fix that does not implement multiple operations in the cluster-resource-handler custom resource provider(i.e. update both logging and access). 

Fixes: aws#21439

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Unconventional Dependencies:

* [ ] This PR adds new unconventional dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-new-unconventional-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-eks Related to Amazon Elastic Kubernetes Service bug This issue is a bug. effort/medium Medium work item – several days of effort p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants