SQS ERROR : ENOTFOUND & EMFILE #145

saiaman · 2013-08-08T15:40:12Z

Hello,
While running a SQS worker, i'm having a lot of trouble with two errors 👍

Error on deleteMessage { [NetworkingError: getaddrinfo ENOTFOUND]
code: 'NetworkingError',
errno: 'ENOTFOUND',
syscall: 'getaddrinfo',
retryable: true,
name: 'NetworkingError',
statusCode: undefined }

Error on deleteMessage { [NetworkingError: connect EMFILE]
code: 'NetworkingError',
errno: 'EMFILE',
syscall: 'connect',
retryable: true,
name: 'NetworkingError',
statusCode: undefined }

Instances are on AWS Opsworks but the same occurs in a barebone server.

lsegal · 2013-08-08T23:30:22Z

Can you show which region and endpoint you are using to connect to? You can do so by also printing the following from the callback:

sqs.someOperation(function (err, data) {
  console.log('region:', this.request.httpRequest.region);
  console.log('endpoint:', this.request.httpRequest.endpoint.hostname);
});

References #145

lsegal · 2013-08-09T04:20:06Z

FYI the above commit adds the region and endpoint hostname to the NetworkingError object so that they will be immediately visible when debugging these sorts of errors in the future, which should make things a little less magical. This will be in the next release (you can also use it directly by typing npm install git://github.com/aws/aws-sdk-js).

nodefourtytwo · 2013-08-09T12:55:36Z

Hi,

I'm on the same team as Saiaman, here is the log after npm install git://github.com/aws/aws-sdk-js :

Error on receiveSQS { [NetworkingError: getaddrinfo ENOTFOUND]
code: 'NetworkingError',
errno: 'ENOTFOUND',
syscall: 'getaddrinfo',
region: 'eu-west-1',
hostname: 'sqs.eu-west-1.amazonaws.com',
retryable: true,
name: 'NetworkingError',
statusCode: undefined }

Seems like, from the worker, we can't ping nor tracert the hostname sqs.eu-west-1.amazonaws.com though the security group configuration seems ok.

nodefourtytwo · 2013-08-09T15:32:21Z

I'll add that it happens not in the beginning but after some time running.

lsegal · 2013-08-12T23:53:38Z

@nodefourtytwo the EMFILE error certainly makes sense if it happens after a while. The EMFILE error means you are trying to open too many file handles, or, in this case, sockets, which can really only happen after a while. Googling for EMFILE shows a few StackOverflow questions with similar Node.js issues.

Are you running SQS against a large number of concurrent items? Have you changed the Agent.maxSockets by any chance? That would certainly cause the EMFILE errors. ENOTFOUND can be a related error, but I'm not entirely sure why you are seeing that one.

nodefourtytwo · 2013-08-13T09:08:21Z

We did not touch Agent.maxSockets. But yes, we're quite heavy on read/write/delete on this queue.

Does it mean we're getting close to the limits of what an SQS queue can take?

lsegal · 2013-08-13T11:37:19Z

Does it mean we're getting close to the limits of what an SQS queue can take?

Not necessarily, it means you are maxing out on your local system resources. If you were hitting SQS limits you would see throttling errors. See the links from the Google search linked above, or this specific StackOverflow question, for instance. You can potentially tune your OS to increase the file handle limits, but you should do this carefully, as increasing these limits can strain the I/O limits on your hardware and reduce overall performance (the ENOTFOUND error seems to imply this, but I cannot confirm without more information).

nodefourtytwo · 2013-08-19T10:21:44Z

ulimit is set to 65536 open files.

The problem happens, as far as we can tell, only on medium instances not on micro instances.

lsegal · 2013-08-20T06:19:46Z

The fact that it happens on medium instances and no micro instances likely means that you have enough memory and processing power to actually saturate your open file limit-- the micros are probably just not reaching these limits. If you need to increase your file limit, you can look at bumping this, though you might also want to look at adding more instances to your cluster to process these messages.

I am going to close this issue since it seems to be very much related to the environment, and there is little we can do in the SDK about the open file limits on your OS. Feel free to open a new issue if you conclude that the SDK is doing something it shouldn't be doing in order to cause this problem, such as leaking file handles.

lock · 2019-09-30T06:56:25Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs and link to relevant comments in this thread.

lsegal added a commit that referenced this issue Aug 8, 2013

Add debugging info for hostname and region in networking errors

f09584c

References #145

lsegal closed this as completed Aug 20, 2013

mhart mentioned this issue Jul 3, 2015

SQS Messages get created but throws a lot of "UnknownEndpoint: Inaccessible host:" errors #649

Closed

lock bot locked as resolved and limited conversation to collaborators Sep 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQS ERROR : ENOTFOUND & EMFILE #145

SQS ERROR : ENOTFOUND & EMFILE #145

saiaman commented Aug 8, 2013

lsegal commented Aug 8, 2013

lsegal commented Aug 9, 2013

nodefourtytwo commented Aug 9, 2013

nodefourtytwo commented Aug 9, 2013

lsegal commented Aug 12, 2013

nodefourtytwo commented Aug 13, 2013

lsegal commented Aug 13, 2013

nodefourtytwo commented Aug 19, 2013

lsegal commented Aug 20, 2013

lock bot commented Sep 30, 2019

SQS ERROR : ENOTFOUND & EMFILE #145

SQS ERROR : ENOTFOUND & EMFILE #145

Comments

saiaman commented Aug 8, 2013

lsegal commented Aug 8, 2013

lsegal commented Aug 9, 2013

nodefourtytwo commented Aug 9, 2013

nodefourtytwo commented Aug 9, 2013

lsegal commented Aug 12, 2013

nodefourtytwo commented Aug 13, 2013

lsegal commented Aug 13, 2013

nodefourtytwo commented Aug 19, 2013

lsegal commented Aug 20, 2013

lock bot commented Sep 30, 2019