Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQS ERROR : ENOTFOUND & EMFILE #145

Closed
saiaman opened this issue Aug 8, 2013 · 10 comments
Closed

SQS ERROR : ENOTFOUND & EMFILE #145

saiaman opened this issue Aug 8, 2013 · 10 comments

Comments

@saiaman
Copy link

saiaman commented Aug 8, 2013

Hello,
While running a SQS worker, i'm having a lot of trouble with two errors 馃憤

Error on deleteMessage { [NetworkingError: getaddrinfo ENOTFOUND]
code: 'NetworkingError',
errno: 'ENOTFOUND',
syscall: 'getaddrinfo',
retryable: true,
name: 'NetworkingError',
statusCode: undefined }

Error on deleteMessage { [NetworkingError: connect EMFILE]
code: 'NetworkingError',
errno: 'EMFILE',
syscall: 'connect',
retryable: true,
name: 'NetworkingError',
statusCode: undefined }

Instances are on AWS Opsworks but the same occurs in a barebone server.

@lsegal
Copy link
Contributor

lsegal commented Aug 8, 2013

Can you show which region and endpoint you are using to connect to? You can do so by also printing the following from the callback:

sqs.someOperation(function (err, data) {
  console.log('region:', this.request.httpRequest.region);
  console.log('endpoint:', this.request.httpRequest.endpoint.hostname);
});

@lsegal
Copy link
Contributor

lsegal commented Aug 9, 2013

FYI the above commit adds the region and endpoint hostname to the NetworkingError object so that they will be immediately visible when debugging these sorts of errors in the future, which should make things a little less magical. This will be in the next release (you can also use it directly by typing npm install git://github.com/aws/aws-sdk-js).

@nodefourtytwo
Copy link

Hi,

I'm on the same team as Saiaman, here is the log after npm install git://github.com/aws/aws-sdk-js :

Error on receiveSQS { [NetworkingError: getaddrinfo ENOTFOUND]
code: 'NetworkingError',
errno: 'ENOTFOUND',
syscall: 'getaddrinfo',
region: 'eu-west-1',
hostname: 'sqs.eu-west-1.amazonaws.com',
retryable: true,
name: 'NetworkingError',
statusCode: undefined }

Seems like, from the worker, we can't ping nor tracert the hostname sqs.eu-west-1.amazonaws.com though the security group configuration seems ok.

@nodefourtytwo
Copy link

I'll add that it happens not in the beginning but after some time running.

@lsegal
Copy link
Contributor

lsegal commented Aug 12, 2013

@nodefourtytwo the EMFILE error certainly makes sense if it happens after a while. The EMFILE error means you are trying to open too many file handles, or, in this case, sockets, which can really only happen after a while. Googling for EMFILE shows a few StackOverflow questions with similar Node.js issues.

Are you running SQS against a large number of concurrent items? Have you changed the Agent.maxSockets by any chance? That would certainly cause the EMFILE errors. ENOTFOUND can be a related error, but I'm not entirely sure why you are seeing that one.

@nodefourtytwo
Copy link

We did not touch Agent.maxSockets. But yes, we're quite heavy on read/write/delete on this queue.

Does it mean we're getting close to the limits of what an SQS queue can take?

@lsegal
Copy link
Contributor

lsegal commented Aug 13, 2013

Does it mean we're getting close to the limits of what an SQS queue can take?

Not necessarily, it means you are maxing out on your local system resources. If you were hitting SQS limits you would see throttling errors. See the links from the Google search linked above, or this specific StackOverflow question, for instance. You can potentially tune your OS to increase the file handle limits, but you should do this carefully, as increasing these limits can strain the I/O limits on your hardware and reduce overall performance (the ENOTFOUND error seems to imply this, but I cannot confirm without more information).

@nodefourtytwo
Copy link

ulimit is set to 65536 open files.

The problem happens, as far as we can tell, only on medium instances not on micro instances.

@lsegal
Copy link
Contributor

lsegal commented Aug 20, 2013

The fact that it happens on medium instances and no micro instances likely means that you have enough memory and processing power to actually saturate your open file limit-- the micros are probably just not reaching these limits. If you need to increase your file limit, you can look at bumping this, though you might also want to look at adding more instances to your cluster to process these messages.

I am going to close this issue since it seems to be very much related to the environment, and there is little we can do in the SDK about the open file limits on your OS. Feel free to open a new issue if you conclude that the SDK is doing something it shouldn't be doing in order to cause this problem, such as leaking file handles.

@lock
Copy link

lock bot commented Sep 30, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants