Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cold-start improvements with a lightweight client #2527

Closed
davidmoten opened this issue Jun 13, 2021 · 12 comments
Closed

Cold-start improvements with a lightweight client #2527

davidmoten opened this issue Jun 13, 2021 · 12 comments
Assignees
Labels
guidance Question that needs advice or information.

Comments

@davidmoten
Copy link

davidmoten commented Jun 13, 2021

I wanted to improve my lambda cold-start times (and still use java) so I knocked up a lightweight client at https://github.com/davidmoten/aws-lightweight-client-java. The client has a friendly API, performs required AWS Signature Version4 request signing, and includes an inbuild xml-parser. The artifact is 56K at the moment which of course has a dramatic effect on the class loading delays incurred in the cold start of a lambda function.

I thought the aws java sdk dev team might be interested to see the perf benefits of the lightweight client but also users should know they have an alternative to really get java lambda cold-start times down (if they can do without the auto-complete goodness of the aws sdk).

Review welcome at https://github.com/davidmoten/aws-lightweight-client-java.

Update: I've noticed some differences in cold start with the Lightweight client just making the client variables static fields. I'll do the same with sdk v1 and v2 and get some more stats.

@davidmoten davidmoten added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Jun 13, 2021
@debora-ito
Copy link
Member

Thank you for the sharing your project @davidmoten!

Curious to know which Java SDK v2 HTTP client you used in the performance comparison, was it the default Apache http-client? Maybe you've answered this already in the repo documentation, I will read it more about it later this week.

@debora-ito debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Jun 14, 2021
@davidmoten
Copy link
Author

davidmoten commented Jun 15, 2021

Curious to know which Java SDK v2 HTTP client you used in the performance comparison, was it the default Apache http-client? Maybe you've answered this already in the repo documentation, I will read it more about it later this week.

I followed the suggested optimizations for v2 (all except keeping sdk clients as static fields which I'll look at next). Those optimizations included using the java platform URLConnection client and excluding apache and netty clients from the built jar.

@davidmoten
Copy link
Author

I moved the instantiation of client objects to static fields and reran my analysis. There is still a big cold-start advantage for using the lightweight client. I've documented it here (and included exact source code used so you can check my patterns).

Here's an extract that is the request time in seconds against an API Gateway + 2GB Lambda integration and you can see that the lightweight client takes about 40% off cold start times on average compared to the v2 SDK.

Average Stdev Min Max n
AWS SDK v1 3.987 0.320 3.583 5.280 28
AWS SDK v2 3.153 0.267 2.918 4.060 28
lightweight 1.938 0.149 1.739 2.376 28

@debora-ito
Copy link
Member

@davidmoten what's your expectation with this issue? Is it for us to review the lightweight client?

@davidmoten
Copy link
Author

@debora-ito A review would be welcome yes if you or other community members have the time. The library has brought my java lambda cold-start time down to the point where I'm quite happy for a cold-start to be incurred occasionally in a user-facing situation (without paying for provisioned concurrency for example). I assume I'm not the only one out there in this situation and having constructed a library for general use I'd like people to know it's there (so they don't have to waste their time building a similar product). A mention of the product might be good on AWS Documentation where you discuss best practices for reducing cold-start times ("if you really want further cold-start time reductions and can manage without auto-complete IDE goodness then have a look at project X").

@debora-ito debora-ito added the needs-review This issue or PR needs review from the team. label Aug 16, 2021
@lestephane
Copy link

lestephane commented Nov 26, 2021

(My 2 cents, I'm not an employee at AWS)

The S3 PutObject functionality is not usable because it does not support a requestBody provided as an InputStream, which is required when putting large files to S3 (and using the file system is not an option, ie in a Lambda environment for example, for files > 250MB).

Mind you, the URLConnection based client is not any better at this, loading everything supplied as InputStream into a ByteOutputStream and causing OOMEs (related PR: see: #2848)

So there is no real alternative to Apache right now for the use case described above. Maybe this test should be added to some sort of s3 client compatibility test suite.

@davidmoten
Copy link
Author

davidmoten commented Nov 26, 2021

@lestephane, have a look at https://github.com/davidmoten/aws-lightweight-client-java/wiki/Recipes#multipart-upload-a-file and the other variants including specifying the data as an InputStream or actively writing to an OutputStream that gets pushed up via Multipart.

By the way I use the OutputStream method mentioned above in Lambdas to read a large object from s3 and transform it in a streaming way to another object in s3 without blowing out memory and without needing to stage to the very limited file system.

@millems
Copy link
Contributor

millems commented Dec 15, 2021

@davidmoten Thanks for your work on this! It's nice to see what the ceiling is on cold-start performance

From reviewing your implementation and profiler results, we generated the following recommendations on improving cold-start in the 2.x SDK (ordered by cold-start benefit it would provide):

  1. Modularize the region metadata loading to be limited to be per-service, so region data doesn't need to be loaded for all services when the user is only using one.
  2. Allow loading execution interceptors statically + create "lightweight" client creation builder: S3Client.builder(Feature...) that has customers opt-in to features that might slow things down (e.g. dynamic HTTP client loading, dynamic execution interceptor loading, slow credential/region providers).
  3. Modularize the SDK and in the process, reduce the number of classes necessary to compose a client.
  4. Reduce unnecessary data copying.

Were there any other magic tricks that we might have missed?

@millems millems self-assigned this Dec 15, 2021
@RyanHoldren
Copy link

I for one would really appreciate more modularization of the SDK.

In our project, about 20% of all the classes loaded at runtime—3195, to be exact, taking up 30 megabytes of metaspace—are from the AWK SDK, even though we barely use it.

For example, this code...

try (final var client = SecretsManagerClient.builder().httpClient(httpClient).build()) {
	final var request = GetSecretValueRequest.builder().secretId(SECRET_ID).build();
	final var response = client.getSecretValue(request);
	return response.secretString();
}

... loads 109 classes just from software.amazon.awssdk.services.secretsmanager.*.

It seems the current architecture causes all the request (e.g. RotateSecretRequest, ValidateResourcePolicyRequest, TagResourceRequest, etc) and all exception classes (e.g. PublicPolicyException, EncryptionFailureException, PreconditionNotMetException, etc) to be loaded whenever a client is created.

@millems
Copy link
Contributor

millems commented Jan 4, 2022

The EC2 client is the worst at that, it's massive. We're limited in our ability to break up a single client into multiple clients or limiting adding to existing clients, because that's a decision of the service team, but we could probably limit the amount of duplicate code per operation a bit better.

@millems
Copy link
Contributor

millems commented Jan 4, 2022

I'm going to close out this issue, then. Thanks again for your code, we have the action items I listed above as follow-up. As usual, we can't give dates on when things will get done because we're always spinning a lot of plates.

@millems millems closed this as completed Jan 4, 2022
@github-actions
Copy link

github-actions bot commented Jan 4, 2022

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@debora-ito debora-ito removed the needs-review This issue or PR needs review from the team. label Jan 18, 2022
aws-sdk-java-automation pushed a commit that referenced this issue Apr 20, 2023
…5-4e96-9d15-8ae6dc8053a3

Pull request: release <- staging/6e6ef805-ba45-4e96-9d15-8ae6dc8053a3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
guidance Question that needs advice or information.
Projects
None yet
Development

No branches or pull requests

5 participants