-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent "EC2 Metadata roleName request returned error" (EINVAL) on ECS Fargate #3284
Comments
Hey @summera thank-you for reaching out to us, while this is very hard to reproduce, is it possible to explicitly set your credentials so that it doesn't touch the metadata depending upon your use case, I understand that should not be the workaround but I would need something more concrete to show to the service team, something which might be reproducible. Would you be able to share your logs? |
Hi @ajredniwja. Thank you for the response. After the issue occurred, I updated the SDK to As for reproducing, I haven't seen this happen since upgrading and scaling up our minimum tasks. However, since this happened during high load when a lot of requests came in and therefore many parallel uploads to S3, I'm wondering if one or more of the following may be possibilities?
Do any of the above sound plausible? |
I cannot point you towards any of those with complete certainty because we dont have any concrete evidence. Can you use the following and collect logs for both the cases, in that way we can compare and come to some conclusion NODE_DEBUG=cluster,net,http,fs,tls,module,timers node app.js |
Makes sense, though I was only asking about plausibility. If any of those are not plausible, it makes it easier to focus efforts.
Which two cases are you referring to exactly? |
I was talking about case where you see the error and the case where you don't, but I think that might be very hard to catch since this is intermittent error. |
Yea, as I mentioned above, I haven't seen this happen since updating the SDK and increasing the minimum ECS tasks by one, so I don't have any logs to share of this happening again. The fact that it was intermittent and is hard to reproduce is why I was asking what might be plausible to see if it's worth descending the rabbit hole and spending time to investigate further. |
Hi everyone, having exactly the same issue @summera reported with almost the same setup. Very intermittent, have 10-15 clusters, receiving a few thousand requests, and the issue seems to raise once every week, so very rare! Had to set cloudwatch alarms with log filter to get those. So, monitoring very closely. ECS Task, fargate managed, nodejs 13 image built from The weirdest thing is that the issue raises into a task that is running for quite a long time, and in the middle of a bunch of successful requests. That said, I would eliminate any configuration issue, but not SDK; however, the clues (for me) point to ECS metadata service being unavailable for some reason. One detail is that we use New Relic on some apps, so the trace is faulted for debugging purposes. Any thoughts?
|
Hi, I am following this doc(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html ) to select region dynamically in aws. And I tried to test the code in aws ecs fargate it gives me below error
However, it runs perfectly on ecs ec2 task. I use "aws-sdk": "^2.701.0". It's js code in a docker container. |
Few occurrences this week. @ajredniwja do you believe is better to open an internal ticket for this? Getting worried. |
Yes. Let’s open a ticket. We will need to subscribe to dev support no prod
account deles. Acho que podes fazer isso usando teu role senão usa o root
account.
Podes fazer isso por favor?
…On Fri, Jul 31, 2020 at 9:37 AM Gabriel Pacheco ***@***.***> wrote:
Few occurrences this week. @ajredniwja <https://github.com/ajredniwja> do
you believe is better to open an internal ticket for this? Getting worried.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3284 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALRW6VZKEHRCQ7G265KKO7TR6LXNBANCNFSM4NNHE2ZQ>
.
|
Getting same issue @ajredniwja |
Same issue here, definitely think it has something to do with ECS Fargate; although, it does work on some of my S3 put object requests. I tried to disable this request w/
I don't use Using |
Same issue here. NodeJS running on fargate. SDK version 2.745.0
|
Seeing this exact issue as well. IAM role needs to be fixed |
Bump. We are seeing this too. ECS/Fargate and node.
|
I had the same problem. It cost me quite some head ache because I had this running in AWS Fargate and debugging is not that easy there. The error means the Javascript SDK can not find the AWS credentials. My error was quite embarrassing, I just had a typo in my environment variables. My variable was So probably double check the names of your environment variables (or config files) |
@antonpirker you're supposed to be able to pass an IAM role to a Task's containers in ECS, meaning you should be able to use the Node SDK w/o relying on access/secret IAM keys. https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html |
I encountered the same error, And I have been trying to fix this error. As I guess, Does the ENI temporally or (consistently) down? I focused on 169,254.... IP address in that error. In my case, when the error happened at once, other AWS API calls (not only s3 put ) also did the same behavior. |
Hey everyone, if there is a reproducible case can you please share it, the internal ticket was opened for the same but there was no reproducible case provided. Seems like to happen under high memory/cpu usage, retrying the request should be considerable. |
I can either enable or disable my AWS_CONFIG_FILE with the same result. I'm also using AWS.config.update() to update my credentials every time my lambda runs. So I have credentials in both the recommended credentials file and I'm explicitly updating them on the fly to something that worked last week. I'm trying to trigger a lambda from my invocation lambda. In short, PHP sends a cURL request to invokeLambda then the invoker triggers a cron lambda to run instantly. I'm attempting to run all of this locally and it worked in the past, but I haven't found a reason that enabled it to work based on the current issue I'm encountering. I wouldn't consider it intermittent, but something takes place where AWS can load the credentials properly. I think I got lucky by doing specific unknown action versus it magically gets the credentials or it doesn't. @ajredniwja I can hop on a call and we can do debugging together if necessary. Update: |
I also took another route trying SQS/SNS locally. Got all the streams and connection points tied together using AWS CLI. @ajredniwja I'm able to reproduce this in at least two different ways now. |
FYI, it may not be a reasonable solution for all, but I confirmed that ECS Fargate works just fine using the v3 AWS Node SDK which came out in General Availability on 12/15: |
I'll follow up with the fix for me; I needed to explicitly set up |
I use environment variables to pass in the AWS keys and following the naming convention from their docs solved the problem for me. SDK will automatically detect and load the environment variables:
Docker Image: node:14.15.4-buster |
Still having this issue with 2.876.0. is there a way to install aws-sdk v3 via npm? UPDATE: I fixed it with setting |
We upgraded from NodeJS 12 to 14 and had a successful run after that. We cannot say whether this is just coincidental or whether it is due to the new NodeJS version. UPDATE: The problem appeared again, so NodeJS 14 is not the solution. 😞 |
i run my code fine on my computer, but get this error when i'm using EC2
docker node version: node:14.15.4-buster UPDATE:
-> outside the container : container itself. |
seeing this occasionally in some task too |
I had the similar issue like below when I running directus on fargate |
Hi, We have been facing these issues too since 2 weeks. It randomly starts when we try to emit sns events. Instead of looking for credentials in the ECS metadata endpoint, it is looking at it in in EC2 metadata endpoint which has just permissions to pull docker images. aws-sdk@2.967.0
|
Did someone resolve this issue? I'm facing the same |
We figured it's some kind of timeout between instance creation and usage of SQS. As a workaround we use |
Facing the same issue could somebody help here ? |
@ruchisharma189 In my experience/case, the intermittent issue was caused by a very high throughput on the metadata API. Metadata API is used by the SDK to retrieve the execution role at every service initialization. Few hundreds of By optimizing the SDK services initialization (caching) and later on, migrating to Hope it helps :) |
Same here (node:49) MetadataLookupWarning: received unexpected error = request to http://169.254.169.254/computeMetadata/v1/instance failed, reason: connect EINVAL 169.254.169.254:80 - Local (0.0.0.0:0) code = EINVAL Running containers on AWS fargate. What does this mean? Any info is appreciated |
@sheyDev Did you come right with this? Currently experiencing the same issue & struggling to narrow down the cause. |
Hi there, I came across this very old issue while grooming through our old backlog. To fetch credentials from the IMDS endpoint the SDK makes a standard HTTP request to Since no one here was able to provide a reproducible use case, there is not a lot for us to go off. With v2 being put into maintenance mode, and the confirmation by multiple people on thread commenting that this is not an issue with v3 (likely because of a more robust retry policy) I'm inclined to close this issue. Thanks, |
Describe the bug
I am running a node 12.16 app on ECS Fargate. It's performing operations on files in S3 - streaming from a source bucket and uploading to a destination bucket. About 5 hours ago I started to see the following error when uploading to the destination bucket:
It happened for several minutes and then stopped. Then happened again for a couple minutes about an hour ago and stopped. So it's intermittent. This seems very similar to what was reported in #2534 (comment) and asked on the forum here, but has received no answer. I'm using a task role that has
PUT
permissions on the destination bucket. As I said, this is intermittent so when it's not happening, everything is working as it should. For some reason, it seems that there is an issue pulling credentials from the metadata service.I'm going to update the SDK to the latest to see if that resolves it but I didn't see anything in the changelog that would indicate it would. Any guidance would be greatly appreciated. Thanks!
Is the issue in the browser/Node.js?
Node.js
If on Node.js, are you running this on AWS Lambda?
No
SDK version number
v2.647.0
The text was updated successfully, but these errors were encountered: