Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default AWS credentials provider chain resolution changed in v3 for EC2MetadataCredentials/ECSCredentials - readFile calls more frequent #2027

Closed
TysonAndre opened this issue Feb 11, 2021 · 7 comments · Fixed by #2223
Assignees
Labels
bug This issue is a bug. performance performance regression

Comments

@TysonAndre
Copy link
Contributor

TysonAndre commented Feb 11, 2021

Describe the bug

After upgrading an ECS(Fargate) from aws-sdk-js-v2 to v3, I started seeing a lot more calls to readFile being performed (in newrelic, per transaction) despite no environment variables changing.

The usage is effectively const { DynamoDB } = require('@aws-sdk/client-dynamodb'); /*... */; const db = new DynamoDB({region: 'us-east-1'}); , and the instance of DynamoDB is reused

I suspect this is because the existence of the AWS_PROFILE environment variable changes the credentials provider chain that is used - This was not the case in v2.
(I still haven't confirmed it)

  • EDIT: It seems like I was mislead. AWS_PROFILE is only used when building the application in question for ECS, not when running the containers on ECS. Maybe it's some other issue such as credentials expiration not being set or creating too many brand new connections, but I can't imagine what
  • EDIT: Before and after installing v3, the application calls aws-sdk v2's new AWS.SecretsManager({endpoint, region}) before other calls to aws but that doesn't seem relevant
  • EDIT: I wonder if v3 is parsing the wrong expiry somehow - e.g. aws-sdk v2's node_modules/aws-sdk/lib/credentials/remote_credentials.js had expireTime: new Date(credData.expiration || credData.Expiration) . I also see the ini files checked before remote credentials
  • EDIT: Strangely, it seems as if the calls to dynamodb are successful, or at least I haven't seen errors
  • EDIT(2021-03-14): Credentials from EKS service account #1808 (comment) mentions why - "The TokenFileWebIdentityCredentials provider was added in v2 in aws/aws-sdk-js 2737 ; It's missing in JS SDK v3, as called out in our blog post." (It wasn't obvious to me which provider the v2 sdk was using in Fargate)

I don't see any documentation of why the decision was made to have ENV_PROFILE('AWS_PROFILE') be used this way in the file itself. It seems to have been done this way in ecb884b or earlier and kept that way (EDIT: Probably not the cause of this issue - AWS_PROFILE is not included in https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-config.html and the aws console does not show it among additional environment variables)

  • E.g. adding a line comment would help
  • I didn't see this change to the credentials provider chain mentioned when glancing the migration notes for v3, but may have just missed it.

V3 credential chain is conditional

// aws-sdk-js-v3/packages/credential-provider-node/src/index.ts
// Observed: The providerChain used unexpectedly depends on AWS_PROFILE.
// If the environment variable AWS_PROFILE is set, then the automatic ECS credentials provider is no longer in the credentials chain?
export function defaultProvider(init: FromIniInit & RemoteProviderInit & FromProcessInit = {}): CredentialProvider {
  const { profile = process.env[ENV_PROFILE] } = init;
  const providerChain = profile
    ? chain(fromIni(init), fromProcess(init))
    : chain(fromEnv(), fromIni(init), fromProcess(init), remoteProvider(init));

  return memoize(
    providerChain,
    (credentials) => credentials.expiration !== undefined && credentials.expiration.getTime() - Date.now() < 300000,
    (credentials) => credentials.expiration !== undefined
  );
}

V2 Credential Chain has no conditions

// node_modules/aws-sdk/lib/node_loader.js

// Setup default chain providers
// If this changes, please update documentation for
// AWS.CredentialProviderChain.defaultProviders in
// credentials/credential_provider_chain.js
AWS.CredentialProviderChain.defaultProviders = [
  function () { return new AWS.EnvironmentCredentials('AWS'); },
  function () { return new AWS.EnvironmentCredentials('AMAZON'); },
  function () { return new AWS.SharedIniFileCredentials(); },
  function () { return new AWS.ECSCredentials(); },
  function () { return new AWS.ProcessCredentials(); },
  function () { return new AWS.TokenFileWebIdentityCredentials(); },
  function () { return new AWS.EC2MetadataCredentials(); }
];

Your environment

SDK version number

@aws-sdk/client-dynamodb@3.4.1 (3.4.1 for everything else)

Is the issue in the browser/Node.js/ReactNative?

Nodejs

Details of the browser/Node.js/ReactNative version

node -v is roughly v14.15.5 (using Docker node:14)

Steps to reproduce

TODO

Observed behavior

Overall performance is worse, requests take longer to process after upgrading from v2 to v3. Calls to Filesystem.readFile are very frequent, probably trying to look up aws credentials in a missing file or failing to cache them.

Expected behavior

No performance impact.

If the environment variable AWS_PROFILE is set, this should not prevent the ECS/EC2 metadata service from being used if the required environment variables are available (i.e. chain(fromIni(init), fromProcess(init)) currently does not contain fromRemoteProvider)

Screenshots

(A spike in calls to readFile(green) was seen when upgrading to aws-sdk-js-v3 and went away after rolling back)

Screenshot_2021-02-11_14-41-11

Additional context

Add any other context about the problem here.

@TysonAndre TysonAndre added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Feb 11, 2021
@TysonAndre TysonAndre changed the title Default AWS credentials provider chain resolution changed in v3 for EC2MetadataCredentials/ECSCredentials Default AWS credentials provider chain resolution changed in v3 for EC2MetadataCredentials/ECSCredentials - readFile calls more frequent Feb 11, 2021
@davidhu2000
Copy link

we are seeing the same thing on our tests. here is an example datadog trace

Screen Shot 2021-03-12 at 5 49 32 PM

we are making two dynamo db calls. and it's reading the filesystem twice for each request

1 to /root/.aws/config
1 to /root/.aws/credentials

even though we have environment variables for the access and secret key. Due to #1808, we used this as a workaround to avoid downgrading to v2.

@trivikr trivikr removed the needs-triage This issue or PR still needs to be triaged. label Mar 18, 2021
@trivikr trivikr self-assigned this Mar 18, 2021
@davidhu2000
Copy link

i ran another test, and we are definitely seeing a performance hit here.

Here is a graph of time spent
Screen Shot 2021-03-24 at 11 24 36 AM

  • purple - the fs operations for config/credential files
  • yellow - the http operations to dynamo

Here is another view

Screen Shot 2021-03-24 at 11 23 41 AM

so we are seeing an overall latency increase in app. Haven't figured out the http increase yet

@zbzoch
Copy link

zbzoch commented Apr 2, 2021

We're seeing the performance penalty caused by two fs operations in every request too.

We're configuring the client with static credentials (see below), so I'd expect no other credentials provider resolution is even done, as it's not needed:

new DynamoDBClient({
      credentials: {
          accessKeyId: <access_key>,
          secretAccessKey: <secret_key>
      }
})

Any idea what's going on? Version 3.11.0

@TysonAndre
Copy link
Contributor Author

TysonAndre commented Apr 2, 2021

Any idea what's going on? Version 3.11.0

According to #1808 (comment)

when do we expect this to be fully releases for use?

The PR #2177 will be published with v3.10.0 on Friday 3/26.
We're likely going to push support to read token file from shared config in #2178 also in the same release.

Ability to call fromTokenFile automatically is a stretch goal for tomorrow, and it's being tracked in #2148

I can't be sure - I'm just guessing without having seen any feedback from maintainers so far. It seems like #2148 is still open and this issue I reported may be a symptom of that? My use case has some things in common with the other tickets such as AWS_PROFILE but I'm uncertain if the framework I'm using is using source_profile (but the applications using the framework does depend on sts)

I see PR #2179 is linked to that and open


I haven't had the time to get a stack trace for this because it's running in fargate and I haven't had much free time. It'd be useful if someone encountering the issue could provide:

  1. What file is being read
  2. The stack trace of what's reading the file
const fs = require('fs');
const originalReadFile = fs.readFile;
fs.readFile = (...args) => {
    console.log(`file=${args[0]} stack=${new Error().stack}`);
    originalReadFile(...args);
};
fs.readFile('node.js', (err, data) => console.log(`err=${err} data=${data}`));

@matej-prokop
Copy link

matej-prokop commented Apr 5, 2021

@TysonAndre here you go

| file=/myuser/.aws/config stack=Error                                                                                                                                  
│     at Object.fs.readFile (/somepath/src/app.js:13:48)
│     at /somepath/node_modules/@aws-sdk/shared-ini-file-loader/dist/cjs/index.js:66:10 
│     at new Promise (<anonymous>) 
│     at slurpFile (/somepath/node_modules/@aws-sdk/shared-ini-file-loader/dist/cjs/index.js:65:29)
│     at Object.loadSharedConfigFiles (/somepath/node_modules/@aws-sdk/shared-ini-file-loader/dist/cjs/index.js:13:9)
│     at /somepath/node_modules/@aws-sdk/node-config-provider/dist/cjs/fromSharedConfigFiles.js:12:53 
│     at /somepath/node_modules/@aws-sdk/property-provider/dist/cjs/chain.js:20:28 
│     at runMicrotasks (<anonymous>) 
│     at processTicksAndRejections (internal/process/task_queues.js:97:5)
│     at async Object.defaultUserAgentProvider (/somepath/node_modules/@aws-sdk/util-user-agent-node/dist/cjs/index.js:32:19)
│     at async /somepath/node_modules/@aws-sdk/middleware-user-agent/dist/cjs/user-agent-middleware.js:25:31 
│     at async /somepath/node_modules/@aws-sdk/middleware-logger/dist/cjs/loggerMiddleware.js:6:22 
│ file=/myuser/.aws/credentials stack=Error 
│     at Object.fs.readFile (/somepath/src/app.js:13:48)
│     at /somepath/node_modules/@aws-sdk/shared-ini-file-loader/dist/cjs/index.js:66:10 
│     at new Promise (<anonymous>) 
│     at slurpFile (/somepath/node_modules/@aws-sdk/shared-ini-file-loader/dist/cjs/index.js:65:29)
│     at Object.loadSharedConfigFiles (/somepath/node_modules/@aws-sdk/shared-ini-file-loader/dist/cjs/index.js:14:9)
│     at /somepath/node_modules/@aws-sdk/node-config-provider/dist/cjs/fromSharedConfigFiles.js:12:53 
│     at /somepath/node_modules/@aws-sdk/property-provider/dist/cjs/chain.js:20:28 
│     at runMicrotasks (<anonymous>) 
│     at processTicksAndRejections (internal/process/task_queues.js:97:5)
│     at async Object.defaultUserAgentProvider (/somepath/node_modules/@aws-sdk/util-user-agent-node/dist/cjs/index.js:32:19)
│     at async /somepath/node_modules/@aws-sdk/middleware-user-agent/dist/cjs/user-agent-middleware.js:25:31 
│     at async /somepath/node_modules/@aws-sdk/middleware-logger/dist/cjs/loggerMiddleware.js:6:22 

Used "@aws-sdk/client-dynamodb": "3.11.0" when DynamoDBClient was initialized in following way:

new DynamoDBClient({
  region: envVariables.DYNAMO_REGION, // has valid region
  endpoint: envVariables.DYNAMO_ENDPOINT, // was undefined during my test
  apiVersion: "2012-08-10",
  credentials: {
    accessKeyId: envVariables.X_AWS_ACCESS_KEY_ID, // has valid access key
    secretAccessKey: envVariables.X_AWS_SECRET_ACCESS_KEY // has valid secret key
  },
  tableName: envVariables.DYNAMO_TABLE_NAME, // we use this "internally"
  maxAttempts: 5
});

@AllanZhengYP AllanZhengYP added the performance performance regression label Apr 8, 2021
@AllanZhengYP
Copy link
Contributor

AllanZhengYP commented Apr 8, 2021

@TysonAndre @matej-prokop Thank you for sharing the detailed stack trace. I think I have an idea where these readFile calls come from. They are from SDK trying to load App ID. They are not cached correctly. I will put a PR for this shortly.

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 24, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. performance performance regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants