Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use default credentials provider for all provided SDK clients #1303

Merged
merged 8 commits into from
Jul 24, 2023

Conversation

roamingthings
Copy link
Contributor

Issue #, if available: #1302

Description of changes:

This pull request removes the conditional configuration of the credentials provider for all SDK clients that are created when no other SDK client has been provided by the application. As a consequence these SDK clients will always use the credentials provider chain that is used by SDK clients by default.

This change is necessary because in some situations a Lambda function that is published using SnapStart will be re-initialized by the runtime with its invocation type set to on-demand. As a consequence the SDK client would be configured to only use the EnvironmentVariableCredentialsProvider. The next time the client is used an error will be thrown since the credentials are not provided differently than without SnapStart.

For more details please see #1302 and http://github.com/roamingthings/sdk-client-snapstart-error-poc

Checklist

Breaking change checklist

RFC issue #:

  • Migration process documented
  • Implement warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@scottgerring
Copy link
Contributor

@msailes as you are closest to Snapstart and the original change it would be great if you could help us convince ourselves this is the best way to deal with this.

IMHO it would also be nice if we can add an E2E test for this - we have the framework for it now running in the repo, and @roamingthings has given us a reproducible deployable test case

@roamingthings
Copy link
Contributor Author

I have to add that I have a support ticket open to understand if the behavior of the runtime (cold-start after timeout) is intentional.

@scottgerring
Copy link
Contributor

Hey @roamingthings to keep you posted - we're asking around internally to make sure that the the default provider chain is the best way to handle this. Concretely this means making sure that we don't take a big performance hit when we work through it, and that there isn't some other best practice we should be employing. We're also having a bit of a look to see if we can add to our E2E tests to catch these sort of issues in the future.

@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2023

Codecov Report

Patch coverage: 20.00% and project coverage change: +0.46 🎉

Comparison is base (4c42359) 79.19% compared to head (5851589) 79.65%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1303      +/-   ##
============================================
+ Coverage     79.19%   79.65%   +0.46%     
  Complexity      639      639              
============================================
  Files            73       73              
  Lines          2360     2345      -15     
  Branches        258      253       -5     
============================================
- Hits           1869     1868       -1     
+ Misses          410      397      -13     
+ Partials         81       80       -1     
Impacted Files Coverage Δ
...mpotency/persistence/DynamoDBPersistenceStore.java 90.07% <0.00%> (+1.87%) ⬆️
...ambda/powertools/parameters/AppConfigProvider.java 89.65% <0.00%> (+4.40%) ⬆️
...lambda/powertools/parameters/DynamoDbProvider.java 84.48% <0.00%> (+4.15%) ⬆️
.../lambda/powertools/parameters/SecretsProvider.java 71.05% <0.00%> (+5.19%) ⬆️
...azon/lambda/powertools/parameters/SSMProvider.java 89.55% <100.00%> (+2.40%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@roamingthings
Copy link
Contributor Author

roamingthings commented Jul 21, 2023

@scottgerring Thank you for keeping me updated. At least support told me that this is the way to go. I'm curious what you will find out. Since there is a workaround for this issue I'm not in a hurry.

I like the idea of having E2E tests in place for something like this. It has been a short night when I discovered this in production.

@mriccia
Copy link
Contributor

mriccia commented Jul 21, 2023

Thank you @roamingthings
I took your example and I used it to run some Load Tests, both with the changes in this PR, and with version 1.16.1. (Note: I made some modifications to run without SnapStart and removing the timeout)

I confirm that with these changes there is no noticeable performance difference, you can find a summary of my tests below:

Version 1.16.1

Summary report @ 19:03:02(+0200) 2023-07-21
  Scenarios launched:  8250
  Scenarios completed: 4745
  Requests completed:  4745
  Mean response/sec: 36.58
  Response time (msec):
    min: 51
    max: 5678
    median: 81
    p95: 156.3
    p99: 5372.5
  Scenario counts:
    0: 8250 (100%)
  Codes:
    200: 4745
  Errors:
    ETIMEDOUT: 3505

Version 1.17.0-SNAPSHOT

Summary report @ 18:57:09(+0200) 2023-07-21
  Scenarios launched:  8246
  Scenarios completed: 4449
  Requests completed:  4449
  Mean response/sec: 36.6
  Response time (msec):
    min: 54
    max: 5923
    median: 80
    p95: 157
    p99: 5393.1
  Scenario counts:
    0: 8246 (100%)
  Codes:
    200: 4449
  Errors:
    ETIMEDOUT: 3797

@roamingthings
Copy link
Contributor Author

For reference this describes the behavior of the runtime when an error like a timeout occurs during the invoke phase. This matches the observed behavior.

https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html#runtimes-lifecycle-restore

@scottgerring
Copy link
Contributor

@mriccia thanks for the data! Looks like we can make this change without fearing a performance regression.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Jul 24, 2023
@sonarcloud
Copy link

sonarcloud bot commented Jul 24, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Contributor

@scottgerring scottgerring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@msailes msailes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@mriccia mriccia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for spotting this issue and for writing this PR, @roamingthings!

@mriccia mriccia merged commit 3594bbd into aws-powertools:main Jul 24, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants