Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(custom-resource): node18 doesn't import sdk v2, and framework depends on sdk v2. #26732

Closed
ahammond opened this issue Aug 11, 2023 · 8 comments · Fixed by #26763
Closed

(custom-resource): node18 doesn't import sdk v2, and framework depends on sdk v2. #26732

ahammond opened this issue Aug 11, 2023 · 8 comments · Fixed by #26763
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. p1

Comments

@ahammond
Copy link
Contributor

Describe the bug

Received response status [FAILED] from custom resource. Message returned: Error: Cannot find module 'aws-sdk' Require stack: - /var/task/index.js - /var/runtime/index.mjs Logs: /aws/lambda/GlobalProdProdUsEast22-Do-ProdUsEast22Dochistorypr-94AvLumRaUIU Require stack: - /var/task/index.js - /var/runtime/index.mjs at _loadUserApp (file:///var/runtime/index.mjs:997:17) at async UserFunction.js.module.exports.load (file:///var/runtime/index.mjs:1029:21) at async start (file:///var/runtime/index.mjs:1192:23) at async file:///var/runtime/index.mjs:1198:1 (RequestId: 87ecf43c-da3d-4806-86b9-c03465a20501)

Upgraded to v2.91.0, issue persists (I did some yarn.lock surgery to make sure only v2.91.0 was present). Downgraded to v2.86.0 and managed to unblock by deploying from my laptop.

I assume it's related to #26325

Expected Behavior

I expected to not have to ever think about the custom resource provider framework. Instead... they break when upgraded to node18 because of bundling issues? Why not upgraded them to use sdk v3 before changing them to node18?!?

Current Behavior

Breaks huge swaths of custom resources. Super awesome when combined with older stacks which will fail to roll back because they were previously on node12. I can't emphasize enough how much CustomResources are both essential and yet also horrible when they have issues. Thankfully the only stack I've fully trashed with this was a test system, but wow. Can CDK / Cfn be trusted to deploy resources which store data?

Reproduction Steps

Working: https://github.com/ahammond/repro-node18-bundling-cdk

switch branch to https://github.com/ahammond/repro-node18-bundling-cdk/tree/with-cdk-latest

and it fails.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.87.0 breaks things

Framework Version

No response

Node.js Version

v16.20.

OS

MacOS

Language

Typescript

Language Version

5.1.3

Other information

No response

@ahammond ahammond added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 11, 2023
@github-actions github-actions bot added the @aws-cdk/custom-resources Related to AWS CDK Custom Resources label Aug 11, 2023
@evgenyka evgenyka added p1 needs-review and removed needs-triage This issue or PR still needs to be triaged. labels Aug 13, 2023
@rix0rrr
Copy link
Contributor

rix0rrr commented Aug 14, 2023

It does not look like I can install your repro:

$ npm install
npm ERR! code E401
npm ERR! 401 Unauthorized - GET https://npm.pkg.github.com/download/@time-loop/cdk-aurora/1.23.0/9d3aeef42506f270e3daf028c20e9f6e824a593b - authentication token not provided

$ yarn install
error An unexpected error occurred: "https://npm.pkg.github.com/download/@time-loop/cdk-aurora/1.23.0/9d3aeef42506f270e3daf028c20e9f6e824a593b: Request failed \"401 Unauthorized\"".

@rix0rrr
Copy link
Contributor

rix0rrr commented Aug 14, 2023

Can you clarify a little on the behavior you're seeing?

  • Does this happen also on fresh deploys, or only on upgrades?
  • You didn't say what resource this happens on. I assume it's in the execution of the Custom Resource Provider?
  • What phase of execution does this happen in? Update? Rollback? Cleanup? Could you send over a representative fragment of the CloudFormation Event History around this period?

@ahammond
Copy link
Contributor Author

@rix0rrr Hmm... the repo is Public. Do you have something like

//npm.pkg.github.com/:_authToken=ghp_0123456789abcdefgh
@time-loop:registry=https://npm.pkg.github.com/

In your ~/.npmrc? I believe you have to be auth'd with github to pull any repo, even if it's public.

This happens for both fresh deploys and upgrades. With upgrades it can be particularly devastating since rollbacks to node12 stuff appear to fail.

I think it's either the custom resource lambda or the provider lambda, but I don't know which.

Event history (from my sandbox account).

2023-08-11 15:21:47 UTC-0700 | ReproNode18BundlingDatabaseInstance20172511C | CREATE_FAILED | Resource creation cancelled
-- | -- | -- | --
2023-08-11 15:21:45 UTC-0700 | ReproNode18BundlingDatabaseProvisioner944F24F6 | CREATE_FAILED | Received response status [FAILED] from custom resource. Message returned: Error: Cannot find module 'aws-sdk' Require stack: - /var/task/index.js - /var/runtime/index.mjs Logs: /aws/lambda/ReproNode18Bundling-ReproNode18Bundlingprovisionda-bGB6c1mKCOf1 Require stack: - /var/task/index.js - /var/runtime/index.mjs at _loadUserApp (file:///var/runtime/index.mjs:997:17) at async UserFunction.js.module.exports.load (file:///var/runtime/index.mjs:1029:21) at async start (file:///var/runtime/index.mjs:1192:23) at async file:///var/runtime/index.mjs:1198:1 (RequestId: e92249f7-efea-4d22-9f44-926fc1ed6aaa)
2023-08-11 15:21:44 UTC-0700 | ReproNode18BundlingDatabaseProvisioner944F24F6 | CREATE_IN_PROGRESS | Resource creation Initiated
2023-08-11 15:21:44 UTC-0700 | ReproNode18BundlingUserProviderframeworkonEventLogRetention4F8199F5 | CREATE_COMPLETE | -
2023-08-11 15:21:44 UTC-0700 | ReproNode18BundlingUserProviderframeworkonEventLogRetention4F8199F5 | CREATE_IN_PROGRESS | Resource creation Initiated

@MrArnoldPalmer
Copy link
Contributor

So I think issue is @time-loop/cdk-named-environments is not a public package? At least for me that's what I'm getting when trying to install deps from the repro repo. I did double check I'm authenticated with the gh npm registry for the @time-loop scope.

Any custom-resource handler that is vended in aws-cdk-lib that is deployed to Node18 should be using aws-sdk v3. The provider framework runtime included. Looking through @time-loop/cdk-aurora, I'm thinking there may be a regression with aws-lambda-nodejs. Since the default runtime version if not passed by the user is now Node18, if aws-sdk (v2) is a dependency it should be included when bundling via BundlingOptions.

@MrArnoldPalmer
Copy link
Contributor

or it looks like you're explicitly externalizing aws-sdk here https://github.com/time-loop/cdk-aurora/blob/main/src/aurora.ts#L416

@rix0rrr should we just revert lambda-nodejs's default runtime to Node16? I think that's the only way to not break with certain bundling configs like above. We can feature-flag it so newly created lambda-nodejs functions default to Node18. I guess this is what I'm leaning towards atm.

@ahammond
Copy link
Contributor Author

ahammond commented Aug 15, 2023 via email

@rix0rrr
Copy link
Contributor

rix0rrr commented Aug 15, 2023

@rix0rrr should we just revert lambda-nodejs's default runtime to Node16?

Maybe, but that feels like it's going to stifle forward progress.

We can specifically detect this configuration and do addWarning/addError based upon it?

if (!props.runtime && (props?.externalModules ?? []).includes('aws-sdk')) {
  Annotations.of(this).addWarning('If you are relying on AWS SDK v2 to be present in the Lambda environment already, please explicitly configure a NodeJS runtime of Node 16 or lower.');
}

But yeah, I guess this is technically breaking, so we'll have to.

Doesn't aws-lambda-nodejs also do something crazy like use the NodeJS version used for synthesis as its default runtime? If we're going to go the feature flag route, we should just make runtime de-facto required with a runtime check, and explicitly opt-in to auto-upgrading behavior with something like Runtime.LATEST_NODE_JS.

MrArnoldPalmer added a commit that referenced this issue Aug 15, 2023
Previously we changed the default version of the lambda-nodejs Function
construct to go from using the `builtInNodeJsCustomResourceRuntime`, a
map of regions to available versions, to `lambda.Runtime.NODEJS_18_X`.
The default `externalModule` configuration excluded the aws-sdk
version based on the runtime passed, excluding v2 for Node16 and under,
and v3 for Node18 and up, but users can pass their own bundling
configuration excluding `aws-sdk` while not explicitly passing a
runtime, which caused their functions to break.

Adds a new `lambda.Runtime` value for `NODEJS_LATEST`. This is central
reference for the latest version of NodeJS provided by the lamdba
service. It also includes a new property `isLatest` which can be used to
indicate that the runtime version may change over time. This can used to
indicate that relying on packages shipped with the environment may not
be relied upon if the version changes. We default to using the
`NODEJS_LATEST` runtime only if the feature flag is enabled. If the flag
is not enabled, use `NODEJS_16_X` to keep supporting users current
bundling configurations.

Additionally, add a warning to tell users if they are excluding a
package from their bundling that we know doesn't exist within the
runtime they are using. IE, if using `NODEJS_18_X` and the exclude list
includes `aws-sdk`, warn users that it won't be present.

fixes: #26732
@mergify mergify bot closed this as completed in #26763 Aug 16, 2023
mergify bot pushed a commit that referenced this issue Aug 16, 2023
Previously we changed the default version of the lambda-nodejs Function construct to go from using the `builtInNodeJsCustomResourceRuntime`, a map of regions to available versions, to `lambda.Runtime.NODEJS_18_X`. The default `externalModule` configuration excluded the aws-sdk version based on the runtime passed, excluding v2 for Node16 and under, and v3 for Node18 and up, but users can pass their own bundling configuration excluding `aws-sdk` while not explicitly passing a runtime, which caused their functions to break.

Adds a new `lambda.Runtime` value for `NODEJS_LATEST`. This is central reference for the latest version of NodeJS provided by the lamdba service. It also includes a new property `isLatest` which can be used to indicate that the runtime version may change over time. This can used to indicate that relying on packages shipped with the environment may not be relied upon if the version changes. We default to using the `NODEJS_LATEST` runtime only if the feature flag is enabled. If the flag is not enabled, use `NODEJS_16_X` to keep supporting users current bundling configurations.

Additionally, add a warning to tell users if they are excluding a package from their bundling that we know doesn't exist within the runtime they are using. IE, if using `NODEJS_18_X` and the exclude list includes `aws-sdk`, warn users that it won't be present.

Fixes #26732

----

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/custom-resources Related to AWS CDK Custom Resources bug This issue is a bug. p1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants