fix(eks): cluster-resource-handler fails to verify delete operations #26375

kishiel · 2023-07-14T23:46:56Z

Motivation

The recent upgrade to the underlying aws-sdk-js dependency in the custom resource handler caused deletion events to fail. This was due to the response structure differing between sdk versions, and the new response lacked the property the handler was previously targeting.

This change leverages the httpResponseCode to follow the same logic as before.

Some additional changes that come with this fix:

Nginx dashboard's latest update breaks the cluster tests, and as a result I've removed them
Nginx ingress controller was also removed as it is affected by a separate new defect either in EKS or VPC (I can't tell) which causes NLBs created by the EKS cluster to survive stack deletion. This appears to be a transient issue, but it even appeared in the simplistic hello-k8s chart.

There's a broader question I have about the use of external Helm charts in these snapshot tests which do not target specific versions--this causes the snapshot tests, which are supposed to be deterministic, to become indeterministic. Has the EKS team considered owning a few simple helm chart repos which provide us with stable versions (and thus deterministic tests)?

I spent nearly 5 days fighting with the snapshot tests and found that a few tests (eks-cluster-test, bottlerocket, and cluster-imported test) would never use new assets and instead would deploy using the old. If I synthesized the test using npx cdk -a test/integ.testname.js and deployed that it would use the new assets. The only way I got past this was to completely delete the existing snapshot directory. I couldn't find references to this behavior anywhere, but I wanted to call it out in case I deleted something that was not replaced with the snapshot update.

Remaining work

The unit tests for the cluster and fargate handler were not updated. I took a stab at them but the interface mocks that we have are no longer accurate, and I couldn't get them to mimic this change.

Fixes #26325

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

aws-cdk-automation · 2023-07-15T00:06:42Z

AWS CodeBuild CI Report

CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
Commit ID: 8eb6cce
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mrgrain · 2023-07-17T09:32:40Z

This is great @kishiel thanks for looking into this. ~~Can you check why the unit tests fail?~~

mrgrain · 2023-07-17T11:56:02Z

Thanks again @kishiel Gonna close this as the team is looking into this with priority now.
Reference: #26283

kishiel added 9 commits July 11, 2023 06:19

fix onDelete handler to catch httpResponseCode for exceptions

62a3bb2

Differentiate ipv6/v4 stack names, remove duplicate helm chart test

c38e784

Fix fargate onDelete to use httpResponseCode

6015fb6

Remove ngnix components due to breaking changes

8c9310d

eks snapshots partial

fecbfcf

aws-stepfunctions-tasks snapshots

6b9bdcc

bottlerocket snapshots

fa9a2aa

cluster snapshot

933f772

imported cluster snapshot

8eb6cce

github-actions bot added bug This issue is a bug. effort/medium Medium work item – several days of effort p2 beginning-contributor [Pilot] contributed between 0-2 PRs to the CDK labels Jul 14, 2023

aws-cdk-automation requested a review from a team July 14, 2023 23:47

mrgrain closed this Jul 17, 2023

github-actions bot mentioned this pull request Aug 1, 2023

Monthly PRs metrics report - July 2023 #26583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eks): cluster-resource-handler fails to verify delete operations #26375

fix(eks): cluster-resource-handler fails to verify delete operations #26375

kishiel commented Jul 14, 2023

aws-cdk-automation commented Jul 15, 2023

mrgrain commented Jul 17, 2023 •

edited

mrgrain commented Jul 17, 2023

fix(eks): cluster-resource-handler fails to verify delete operations #26375

fix(eks): cluster-resource-handler fails to verify delete operations #26375

Conversation

kishiel commented Jul 14, 2023

Motivation

Remaining work

aws-cdk-automation commented Jul 15, 2023

AWS CodeBuild CI Report

mrgrain commented Jul 17, 2023 • edited

mrgrain commented Jul 17, 2023

mrgrain commented Jul 17, 2023 •

edited