Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When multiple Lambda functions share code, build a single asset shared between them #989

Closed
tomfaulhaber opened this issue Oct 22, 2018 · 5 comments · Fixed by #1094 or #1011
Closed
Assignees

Comments

@tomfaulhaber
Copy link

Use case:

In order to build lambdas that use the Python science stack, we need to build a giant (50+MB) zip file with all the dependencies. We then add all of our Python functions to that zip and specify the same zip as the code for each lambda with a different handler.

The problem:

Each of the lambda functions that I create have a unique asset object (even if I create a single, shared lambda.Code object). Each of these objects creates an S3 object. So if I have ten lambda functions, I will upload ten separate asset objects at 50+ MB each.

Potential solutions:

  1. Make each lambda.AssetCode object unique by checking to see if asset is already set before creating a new one in bind(). This means that users would need to explicitly create the shared Code object and reuse it in each lambda definition.
  2. Make it unique to the path in the lambda.AssetCode by adding a map to lambda.AssetCode and pulling the same asset into multiple AssetCode objects. In this scenario, users could simply reference the same file/directory.
  3. Build the solution lower down in the assets.Asset class. In this example, we would cache the path and make sure that we only do the upload once.

I think the last solution is the only real solution, because the other two solutions don't handle the case where you create two lambdas, but only wire the second one into your app which is likely to happen in various scenarios.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 22, 2018

We can also employ server-side copies (upload once then copy out)

@tomfaulhaber
Copy link
Author

@rix0rrr for my use case that would be bad since it would just make lots of extra copies and run up my S3 bill. Playing with the CDK has already created 27GB of stuff in my CDK assets bucket.

FWIW, I implemented option 1 and it seems to work great in my case, but I think it would fail if I didn't use my first lambda in my app.

@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 23, 2018

I understand. But, the reason we have multiple copies of the same file for multiple assets is that they are in different prefixes.

We have to allow that they can evolve individually (maybe they're the same by accident) and the consuming resources need read permissions on the entire prefix so that jobs started on an old asset don't suddenly lose perms to read their file if the "current" version changes.

Not applicable to Lamda so much but definitely to CodeBuild for example

@rix0rrr rix0rrr self-assigned this Oct 23, 2018
@rix0rrr
Copy link
Contributor

rix0rrr commented Oct 25, 2018

I see the issue with AssetCode. It's being silly by creating the same Asset object multiple times, and every Asset object triggers an upload. The proper solution would be to make AssetCode a construct, but in order not to break API compatibility I'm going to cache the Asset object like you've been doing locally.

I will trigger a server-side copy if the same Asset data is uploaded multiple times to different Asset objects, but I'm not going to deduplicate inside the S3 bucket (in order to not run afoul of permission issues as mentioned before).

I'm sorry about your S3 bill, but I think the more general solution to this will be some kind of garbage collection facility (as opposed to frugally uploading, which helps in the short term but not in the long run).

rix0rrr pushed a commit that referenced this issue Oct 25, 2018
This change implements two asset upload frugality measures:

- If a lambda.AssetCode object is reused for multiple Lambdas,
  the same underyling Asset object will be reused (which leads to
  the asset data only being uploaded once).
- If nonetheless multiple Asset objects are created for the same
  source data, the data will only be uploaded once and subsequently
  copied on the server-side to avoid the additional data transfer.

Fixes #989.
rix0rrr added a commit that referenced this issue Nov 1, 2018
This change implements two asset bandwidth conservation measures:

- If a lambda.AssetCode object is reused for multiple Lambdas,
  the same underyling Asset object will be reused (which leads to
  the asset data only being uploaded once).
- If nonetheless multiple Asset objects are created for the same
  source data, the data will only be uploaded once and subsequently
  copied on the server-side to avoid the additional data transfer.

Fixes #989.
rix0rrr pushed a commit that referenced this issue Nov 6, 2018
Bug Fixes
=========

* **aws-autoscaling:** allow minSize to be set to 0 ([#1015](#1015)) ([67f7fa1](67f7fa1))
* **aws-codebuild:** correctly pass the timeout property to CFN when creating a Project. ([#1071](#1071)) ([b1322bb](b1322bb))
* **aws-codebuild:** correctly set S3 path when using it as artifact. ([#1072](#1072)) ([f32cba9](f32cba9))
* **aws-kms:** add output value when exporting an encryption key ([#1036](#1036)) ([cb490be](cb490be))
* Switch from `js-yaml` to `yaml` ([#1092](#1092)) ([0b132b5](0b132b5))

Features
=========

* **applets:** integrate into toolkit ([#1039](#1039)) ([fdabe95](fdabe95)), closes [#849](#849) [#342](#342) [#291](#291)
* **aws-codecommit:** use CloudWatch Events instead of polling by default in the CodePipeline Action. ([#1026](#1026)) ([d09d30c](d09d30c))
* **aws-dynamodb:** allow specifying partition/sort keys in props ([#1054](#1054)) ([ec87331](ec87331)), closes [#1051](#1051)
* **aws-ec2:** AmazonLinuxImage supports AL2 ([#1081](#1081)) ([97b57a5](97b57a5)), closes [#1062](#1062)
* **aws-lambda:** high level API for event sources ([#1063](#1063)) ([1be3442](1be3442))
* **aws-sqs:** improvements to IAM grants API ([#1052](#1052)) ([6f2475e](6f2475e))
* don't upload the same asset multiple times ([#1011](#1011)) ([35937b6](35937b6)), closes [#989](#989)
* **codepipeline/cfn:** Use fewer statements for pipeline permissions ([#1009](#1009)) ([8f4c2ab](8f4c2ab))
* add a new construct library for ECS ([#1058](#1058)) ([ae03ddb](ae03ddb))
* **pkglint:** Make sure .snk files are ignored ([#1049](#1049)) ([53c8d76](53c8d76)), closes [#643](#643)
* **toolkit:** deployment ui improvements ([#1067](#1067)) ([c832eaf](c832eaf))

BREAKING CHANGES
=========

* The ec2.Connections object has been changed to be able to manage multiple
  security groups. The relevant property has been changed from `securityGroup`
  to `securityGroups` (an array of security group objects).
* **aws-codecommit:** This modifies the default behavior of the CodeCommit
  Action.  It also changes the internal API contract between the
  aws-codepipeline-api module and the CodePipeline Actions in the service
  packages.
* **applets:** The applet schema has changed to allow Multiple applets can be
  define in one file by structuring the files like this:
* **applets:** The applet schema has changed to allow definition of multiple
  applets in the same file.

The schema now looks like this:

    applets:
      MyApplet:
        type: ./my-applet-file
        properties:
          property1: value
          ...
By starting an applet specifier with npm://, applet modules can
directly be referenced in NPM. You can include a version specifier
(@1.2.3) to reference specific versions.
* **aws-sqs:** `queue.grantReceiveMessages` has been removed. It is unlikely
  that this would be sufficient to interact with a queue. Alternatively you can
  use `queue.grantConsumeMessages` or `queue.grant('sqs:ReceiveMessage')` if
  there's a need to only grant this action.
@rix0rrr rix0rrr mentioned this issue Nov 6, 2018
rix0rrr pushed a commit that referenced this issue Nov 6, 2018
Bug Fixes
========

* **aws-autoscaling:** allow minSize to be set to 0 ([#1015](#1015)) ([67f7fa1](67f7fa1))
* **aws-codebuild:** correctly pass the timeout property to CFN when creating a Project. ([#1071](#1071)) ([b1322bb](b1322bb))
* **aws-codebuild:** correctly set S3 path when using it as artifact. ([#1072](#1072)) ([f32cba9](f32cba9))
* **aws-kms:** add output value when exporting an encryption key ([#1036](#1036)) ([cb490be](cb490be))
* Switch from `js-yaml` to `yaml` ([#1092](#1092)) ([0b132b5](0b132b5))

Features
========

* don't upload the same asset multiple times ([#1011](#1011)) ([35937b6](35937b6)), closes [#989](#989)
* **app-delivery:** CI/CD for CDK Stacks ([#1022](#1022)) ([f2fe4e9](f2fe4e9))
* add a new construct library for ECS ([#1058](#1058)) ([ae03ddb](ae03ddb))
* **applets:** integrate into toolkit ([#1039](#1039)) ([fdabe95](fdabe95)), closes [#849](#849) [#342](#342) [#291](#291)
* **aws-codecommit:** use CloudWatch Events instead of polling by default in the CodePipeline Action. ([#1026](#1026)) ([d09d30c](d09d30c))
* **aws-dynamodb:** allow specifying partition/sort keys in props ([#1054](#1054)) ([ec87331](ec87331)), closes [#1051](#1051)
* **aws-ec2:** AmazonLinuxImage supports AL2 ([#1081](#1081)) ([97b57a5](97b57a5)), closes [#1062](#1062)
* **aws-lambda:** high level API for event sources ([#1063](#1063)) ([1be3442](1be3442))
* **aws-sqs:** improvements to IAM grants API ([#1052](#1052)) ([6f2475e](6f2475e))
* **codepipeline/cfn:** Use fewer statements for pipeline permissions ([#1009](#1009)) ([8f4c2ab](8f4c2ab))
* **pkglint:** Make sure .snk files are ignored ([#1049](#1049)) ([53c8d76](53c8d76)), closes [#643](#643)
* **toolkit:** deployment ui improvements ([#1067](#1067)) ([c832eaf](c832eaf))
* Update to CloudFormation resource specification v2.11.0

BREAKING CHANGES
========

* The ec2.Connections object has been changed to be able to manage multiple
  security groups. The relevant property has been changed from `securityGroup`
  to `securityGroups` (an array of security group objects).
* **aws-codecommit:** this modifies the default behavior of the CodeCommit
  Action.  It also changes the internal API contract between the
  aws-codepipeline-api module and the CodePipeline Actions in the service
  packages.
* **applets:** The applet schema has changed to allow Multiple applets can be
  define in one file by structuring the files like this:
* **applets:** The applet schema has changed to allow definition of multiple
  applets in the same file.

The schema now looks like this:

    applets:
      MyApplet:
        type: ./my-applet-file
        properties:
          property1: value
          ...
By starting an applet specifier with npm://, applet modules can directly be
referenced in NPM. You can include a version specifier (@1.2.3) to reference
specific versions.
* **aws-sqs:** `queue.grantReceiveMessages` has been removed. It is unlikely
  that this would be sufficient to interact with a queue. Alternatively you can
  use `queue.grantConsumeMessages` or `queue.grant('sqs:ReceiveMessage')` if
  there's a need to only grant this action.
rix0rrr added a commit that referenced this issue Nov 6, 2018
Bug Fixes
========

* **aws-autoscaling:** allow minSize to be set to 0 ([#1015](#1015)) ([67f7fa1](67f7fa1))
* **aws-codebuild:** correctly pass the timeout property to CFN when creating a Project. ([#1071](#1071)) ([b1322bb](b1322bb))
* **aws-codebuild:** correctly set S3 path when using it as artifact. ([#1072](#1072)) ([f32cba9](f32cba9))
* **aws-kms:** add output value when exporting an encryption key ([#1036](#1036)) ([cb490be](cb490be))
* Switch from `js-yaml` to `yaml` ([#1092](#1092)) ([0b132b5](0b132b5))

Features
========

* don't upload the same asset multiple times ([#1011](#1011)) ([35937b6](35937b6)), closes [#989](#989)
* **app-delivery:** CI/CD for CDK Stacks ([#1022](#1022)) ([f2fe4e9](f2fe4e9))
* add a new construct library for ECS ([#1058](#1058)) ([ae03ddb](ae03ddb))
* **applets:** integrate into toolkit ([#1039](#1039)) ([fdabe95](fdabe95)), closes [#849](#849) [#342](#342) [#291](#291)
* **aws-codecommit:** use CloudWatch Events instead of polling by default in the CodePipeline Action. ([#1026](#1026)) ([d09d30c](d09d30c))
* **aws-dynamodb:** allow specifying partition/sort keys in props ([#1054](#1054)) ([ec87331](ec87331)), closes [#1051](#1051)
* **aws-ec2:** AmazonLinuxImage supports AL2 ([#1081](#1081)) ([97b57a5](97b57a5)), closes [#1062](#1062)
* **aws-lambda:** high level API for event sources ([#1063](#1063)) ([1be3442](1be3442))
* **aws-sqs:** improvements to IAM grants API ([#1052](#1052)) ([6f2475e](6f2475e))
* **codepipeline/cfn:** Use fewer statements for pipeline permissions ([#1009](#1009)) ([8f4c2ab](8f4c2ab))
* **pkglint:** Make sure .snk files are ignored ([#1049](#1049)) ([53c8d76](53c8d76)), closes [#643](#643)
* **toolkit:** deployment ui improvements ([#1067](#1067)) ([c832eaf](c832eaf))
* Update to CloudFormation resource specification v2.11.0

BREAKING CHANGES
========

* The ec2.Connections object has been changed to be able to manage multiple
  security groups. The relevant property has been changed from `securityGroup`
  to `securityGroups` (an array of security group objects).
* **aws-codecommit:** this modifies the default behavior of the CodeCommit
  Action.  It also changes the internal API contract between the
  aws-codepipeline-api module and the CodePipeline Actions in the service
  packages.
* **applets:** The applet schema has changed to allow Multiple applets can be
  define in one file by structuring the files like this:
* **applets:** The applet schema has changed to allow definition of multiple
  applets in the same file.

The schema now looks like this:

    applets:
      MyApplet:
        type: ./my-applet-file
        properties:
          property1: value
          ...
By starting an applet specifier with npm://, applet modules can directly be
referenced in NPM. You can include a version specifier (@1.2.3) to reference
specific versions.
* **aws-sqs:** `queue.grantReceiveMessages` has been removed. It is unlikely
  that this would be sufficient to interact with a queue. Alternatively you can
  use `queue.grantConsumeMessages` or `queue.grant('sqs:ReceiveMessage')` if
  there's a need to only grant this action.
@AaronHarris
Copy link

What's the workaround here? Just pass in the same lambda.AssetCode() object ref to different lambdas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants