Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CredentialsProviderError despite credentials being available #4867

Closed
3 tasks done
nwalters512 opened this issue Jun 21, 2023 · 11 comments
Closed
3 tasks done

CredentialsProviderError despite credentials being available #4867

nwalters512 opened this issue Jun 21, 2023 · 11 comments
Assignees
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@nwalters512
Copy link
Contributor

nwalters512 commented Jun 21, 2023

Checkboxes for prior research

Describe the bug

I have an application running on an EC2 instance. The instance has a role associated with it, and my application uses that role to interact with AWS. I construct clients in the normal way:

import { S3 } from '@aws-sdk/client-s3';
const s3 = new S3({ region: 'ca-central-1' });

Virtually all of the time, this works fine, and my application can successfully perform S3 operations. However, sometimes the SDK appears to get into a bad state. Without any change to my application code, the EC2 instance, IAM roles, or anything else, S3 operations via the SDK begin failing with the following error:

CredentialsProviderError: Could not load credentials from any providers

However, as far as I can tell, credentials should absolutely still be available. If I connect to the instance while these failures are occurring, I can run the following and get back credentials (note that PLWebServerRole is the name of the role attached to my instance):

TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` && curl -H "X-aws-ec2-metadatatoken: $TOKEN" -v http://169.254.169.254/latest/meta-data/iam/security-credentials/PLWebServerRole

Moreover, if I open a Node shell and run the following code, I also get back credentials:

const { fromInstanceMetadata } = require("@aws-sdk/credential-provider-imds");
const provider = fromInstanceMetadata();
provider().then(console.log).catch(console.error);

All the while, my application continues experiencing failures. This is ultimately fixed by restarting my application.

SDK version number

3.342.0

Which JavaScript Runtime is this issue in?

Node.js

Details of the browser/Node.js/ReactNative version

Node 16.20.0

Reproduction Steps

This appears to be an edge case that has something to do with transient failure of IMDSv2, and so I am unfortunately unable to provide a self-contained reproduction. I'm doing my best to provide as much information as I possibly can.

Observed Behavior

Operations fail with the following error, despite the fact that credentials are in fact available from IMDS:

CredentialsProviderError: Could not load credentials from any providers

Expected Behavior

Credentials should be obtained from IMDS and used during API operations.

Possible Solution

As best as I can tell, the AWS SDK credential-loading machinery is caching errors (or not correctly retrying in the face of errors), probably from transient timeouts while talking to the IMDS. This caching appears to be global, as despite the fact that I construct a new S3 SDK object for each operation, they continue to fail, even after I've confirmed that the IMDS is returning valid credentials.

I've tried reading through the code, but there are so many layers of indirection that I found it difficult to figure out where invalid state might be being stored.

Additional Information/Context

I see similar issues have been discussed previously in #4679, but that was closed without a real resolution. It's possible that that was a different underlying issue because of the use of kube2iam.

AFAICT, this is a regression in the v3 SDK. We only started seeing these failures after migrating our entire application from v2 to v3.

@nwalters512 nwalters512 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 21, 2023
@nwalters512
Copy link
Contributor Author

I believe that this may have been caused by constructing too many S3 clients in rapid succession, although I've yet to test a fix that reuses client. I'm hypothesizing that IMDS may have undocumented rate limits that are triggered when making a lot of calls in rapid succession; I was unable to find anything about this in the IMDS docs.

While this is somewhat fixable from the application side, I don't see any documented best practices indicating that one should limit the rate at which clients are created. If this is indeed something to avoid, this should be documented.

Documentation aside, this seems like something the SDK should be handle automatically, perhaps by sharing cached credentials between client instances. I believe the v2 SDK may have done this, which would explain why I only started seeing issues after switching to the v3 SDK.

@yenfryherrerafeliz
Copy link
Contributor

Hi @nwalters512, the IMDS documentation actually says that we need to be mindful and try to avoid a high number of requests to the metadata service to avoid throttling errors. This information can be found here:

We throttle queries to the IMDS on a per-instance basis, and we place limits on the number of simultaneous connections from an instance to the IMDS.

If you're using the IMDS to retrieve AWS security credentials, avoid querying for credentials during every transaction or concurrently from a high number of threads or processes, as this might lead to throttling. Instead, we recommend that you cache the credentials until they start approaching their expiry time. For more information about IAM role and security credentials associated with the role, see Retrieve security credentials from instance metadata.

If you are throttled while accessing the IMDS, retry your query with an exponential backoff strategy.

I know the error gotten from the SDK is not pretty clear about if the issue was that you exceeded the request rate limit when retrieving the credentials from IMDS, and this makes harder to debug the error. The reason why the error is not pretty clear is because you are using the default chain credential resolution, which basically will keep trying until one of the defined credential provider works. Something you can do is to specify the credential provider when instancing the client as follow, and by doing this you will get the error from that specific credential provider:

import { fromInstanceMetadata } from "@aws-sdk/credential-provider-imds";
import { S3Client } from "@aws-sdk/client-s3";

const client = new S3Client({
    region: "us-east-2",
    credentials: fromInstanceMetadata({})
})

You can also customize the number of retries that can be done when fetching the credentials, which by default is 3, as follow:

import { fromInstanceMetadata } from "@aws-sdk/credential-provider-imds";
import { S3Client } from "@aws-sdk/client-s3";

const client = new S3Client({
    region: "us-east-2",
    credentials: fromInstanceMetadata({
        maxRetries: 10
    })
})

I hope this helps!

Thanks!

@yenfryherrerafeliz yenfryherrerafeliz added response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Jul 10, 2023
@yenfryherrerafeliz yenfryherrerafeliz self-assigned this Jul 10, 2023
@github-actions
Copy link

This issue has not received a response in 1 week. If you still think there is a problem, please leave a comment to avoid the issue from automatically closing.

@github-actions github-actions bot added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Jul 18, 2023
@WayneEllery
Copy link

I've also got this error. We are creating the S3 client only once. It happened for the first time once. The error occurred on send

@nwalters512
Copy link
Contributor Author

@yenfryherrerafeliz thanks for the response and the link to the IMDS documentation! This sounds like something that should be present in SDK documentation, especially in documentation for upgrading from the v2 SDK where this same problem wouldn't necessarily occur by default.

I don't have any interest in overriding the default credential chain; I'd like to have the option to resolve credentials in all the ways supported by the default chain. Is there no possibility of automatically sharing credentials providers/chains between different clients? The fact that this sort of "just worked" in the v2 SDK is really attractive.

@github-actions github-actions bot removed closing-soon This issue will automatically close in 4 days unless further comments are made. response-requested Waiting on additional info and feedback. Will move to \"closing-soon\" in 7 days. labels Jul 20, 2023
@ruiwei
Copy link

ruiwei commented Aug 4, 2023

I have had the same error for a while, it is random, even a different error message sometimes (see below). And I had the "count not load credential in any provider" sometime in the aws-actions for Github (aws-actions/configure-aws-credentials@v2).

{
    "tryNextLink": true,
    "name": "ProviderError",
    "errno": -22,
    "code": "EINVAL",
    "syscall": "connect",
    "address": "169.254.169.254",
    "port": 80,
    "$metadata": {
        "attempts": 1,
        "totalRetryDelay": 0
    }
}

@RanVaknin
Copy link
Contributor

Hi @nwalters512 ,

IMDS should "just work" out of the box, you don't need to specify the provider explicitly to get it to work. The credential chain is implemented in a similar fashion between v2 and v3, so I don't see a need to document this in the migration guide, however if you have a suggestion on how to update the docs please submit a separate issue labeled documentation with the proposed changes.

. Is there no possibility of automatically sharing credentials providers/chains between different clients?

It is recommended to use one client in the global scope (or at least outside the context of the function that makes operations) and make the service calls from that one single client like so:

import { S3Client } from "@aws-sdk/client-s3";

+const client = new S3Client({})

async function putObject(){
-    const client = new S3Client({})
    
    try{
        const res = await client.send(new PutObjectCommand({...}));
        console.log(res);
    } catch (e){
        console.log("error:", e)
    }
}

If you create a client every time you need to make a call, it will attempt to get credentials every time you make a network request which can cause throttling errors when getting creds from STS.

For all the others responding here, this error can arise from multiple different reasons, I suggest you all submit individual issues so we can assist you on a case-by-case basis.

Thanks,
Ran~

@RanVaknin RanVaknin added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Aug 4, 2023
@nwalters512
Copy link
Contributor Author

The piece that I feel needs documentation is specifically the recommendation to avoid creating multiple clients (as in one per function call). Doing so appears to have been valid in v2, and I don't see any migration documentation warning about the problems with doing so.

@github-actions github-actions bot removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Aug 8, 2023
@WarrenWongCodes
Copy link

Was this issue resolved? I have attempted maxRetries set to 3 however still recieve return error message:

{"tryNextLink":true,"name":"ProviderError","$metadata":{"attempts":1,"totalRetryDelay":0}}

I have tried both '@aws-sdk/credential-providers' and @aws-sdk/credential-providers-imds

@RanVaknin
Copy link
Contributor

Hi @nwalters512 ,

Thanks for the feedback! After looking at our README, I can see where the confusion comes from.
I have created a backlog item for us to update the main readme with some better examples.

Since your initial concern was answered I feel confident we can close this.

All the best,
Ran~

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

6 participants