Running entire suite of 'grunt functional' results in 'EMFILE, too many open files' error #148

michellai · 2013-02-08T19:28:09Z

The flick gesture test passes consistently when run via this specific command:
$ mocha -t 90000 -R spec test/functional/uicatalog/flick.js
flick gesture
✓ should work via webdriver method (4194ms)
✓ should work via mobile only method (4179ms)
✓ should work via mobile only method with percentage (4186ms)
...
8 tests complete (2 minutes)

However, when running the same tests as part of the large suite of 'grunt functional' tests, I am consistently blocked by this file handler error:
$ grunt functional
Running "functional" task

active
✓ should return active element (3319ms)
...
flick gesture
✓ should work via webdriver method (4235ms)
◦ should work via mobile only method:
/Users/mlai/dev/appium/node_modules/winston/lib/winston/exception.js:29
cwd: process.cwd(),
^
Error: EMFILE, too many open files
at Object.exception.getProcessInfo (/Users/mlai/dev/appium/node_modules/winston/lib/winston/exception.js:29:26)
at Object.exception.getAllInfo (/Users/mlai/dev/appium/node_modules/winston/lib/winston/exception.js:17:24)
at Logger._uncaughtException (/Users/mlai/dev/appium/node_modules/winston/lib/winston/logger.js:600:24)
at process.EventEmitter.emit (events.js:126:20)

Thanks to jlipps for confirming this issue as a problem with appium or one of its dependency failing to release Node's file handlers.

bootstraponline · 2013-02-11T15:29:07Z

I ran into the same error on OS X when writing tests.

/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/exception.js:29
    cwd:         process.cwd(),
                         ^
Error: EMFILE, too many open files
    at Object.exception.getProcessInfo (/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/exception.js:29:26)
    at Object.exception.getAllInfo (/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/exception.js:17:24)
    at Logger._uncaughtException (/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/logger.js:600:24)
    at process.EventEmitter.emit (events.js:126:20)

jlipps · 2013-02-12T01:27:26Z

Hi guys,

Try running this in your shell:

ulimit -n 4096

And then run the testsuite again to see if that helps! (Don't forget to pull and reset your npm dependencies...)

bootstraponline · 2013-02-12T02:13:17Z

Appium hasn't crashed yet after running that command.

jlipps · 2013-02-12T16:41:41Z

I had to bump my ulimit to higher than 4096 for this to work, but it did eventually. My new theory is that it is instruments writing tracedirs with tons of files that causes these errors. The issue would then be that grunt is spawning many instruments sessions via appium, so that all these have the same parent process. Not sure if there's a clean way around this. For now, I'll keep bumping my ulimit and we can reopen this if we run into a tighter spot sometime in the future.

gaige · 2013-02-12T20:03:06Z

I have tracked this down to half-open sockets on our side (in particular /tmp/instruments). If you do lsof -p and the node.js process, you can see that hundreds are left around.

In particular, 'allowHalfOpen' is turned on when the sockets are listened to, but conn.end() is never called to close them up after the data is sent back.

I'm working on a patch on this, and I'll put in a pull request if it passes all of the tests.

jlipps · 2013-02-12T20:08:56Z

Brilliant! I had some theories about instruments but didn't think about half-open sockets. When I wrote that code I wondered what the side effects might be :-) Your patch will be greatly appreciated.

bootstraponline · 2013-02-12T21:59:56Z

It just crashed again. I'm also looking forward to the fix.

gaige · 2013-02-12T22:48:34Z

Taking a little longer to completely remove it..... soon

gaige · 2013-02-12T23:33:19Z

Well, it's definitely a bit more pernicious than I thought it was, and it's actually a larger problem, since the sockets will sit around for as long as the server runs. It's starting to look like there is a long-term reference being held somewhere and it isn't being closed. So far, the only place where the connection appears to be held is in the cmd object in the command queue.

gaige · 2013-02-13T15:54:12Z

This is taking quite a bit longer than originally expected. The crux of the matter is some pretty major leakage. It appears to be leaking IOS and Instruments objects until the entire process exits (both by inspection of the open files with lsof and using nodetime and profiling to look for leaks. There are a lot of closures in here and it wouldn't surprise me if we had one somewhere that is retaining itself or otherwise creating a circular reference.

I'm going to continue trying to see if I can track this down.

jlipps · 2013-02-13T16:49:03Z

Thanks @gaige! Let me know if you need any insight into how things are architected. Sounds like you've got things pretty well mapped out by now though.

gaige · 2013-02-13T18:02:35Z

At this point, I've definitely tracked down a few leaks and broken some cycles which reduce substantially the number of leaked pipes. However, there is still basically one fd leak per instantiation of Instruments, despite the fact that all of the Instruments and processes are now appropriately disposed of.

At this point, all Socket classes are clearing, so I'm now suspecting one of two things:

one side or the other needs to manually perform a socket operation on the pipe fd
there's something broken inside of node.js dealing with this scenario

I have seen comments on other threads about leaked file descriptors with unix domain sockets when trying to interface with mysql and postgres (making me think it might be a general problem), but no indication of any specific bug found or work-around.

I'm going to toy with this a little longer and then put up the fixes that I made for this, since they're at least reducing the problem quite a bit.

michellai · 2013-02-13T18:36:07Z

Thanks so much for your diligence on this issue, Gaige- I really appreciate
it! Can't wait to pull the next latest.

On Wed, Feb 13, 2013 at 10:02 AM, Gaige B Paulsen
notifications@github.comwrote:

At this point, I've definitely tracked down a few leaks and broken some
cycles which reduce substantially the number of leaked pipes. However,
there is still basically one fd leak per instantiation of Instruments,
despite the fact that all of the Instruments and processes are now
appropriately disposed of.

At this point, all Socket classes are clearing, so I'm now suspecting one
of two things:

one side or the other needs to manually perform a socket operation
on the pipe fd

there's something broken inside of node.js dealing with this scenario

I have seen comments on other threads about leaked file descriptors with
unix domain sockets when trying to interface with mysql and postgres
(making me think it might be a general problem), but no indication of any
specific bug found or work-around.

I'm going to toy with this a little longer and then put up the fixes that
I made for this, since they're at least reducing the problem quite a bit.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-13507893.

jlipps · 2013-02-14T18:17:29Z

almost completely addressed in #187 which has been merged now. thanks @gaige!

gaige · 2013-02-14T22:06:12Z

HAH! I do believe I've finally tracked this puppy down.... more news in a minute, but so far I'm running functional and I've accumulated only the one open socket required to run it.

jlipps · 2013-02-14T22:25:18Z

awesome @gaige!

On Feb 14, 2013, at 2:06 PM, Gaige B Paulsen notifications@github.com wrote:

HAH! I do believe I've finally tracked this puppy down.... more news in a minute, but so far I'm running functional and I've accumulated only the one open socket required to run it.

—
Reply to this email directly or view it on GitHub.

gaige · 2013-02-14T22:34:16Z

This may be a bug in the node.js handling of domain sockets in general, because I've seen a bunch of other posts in various places (other projects, stack exchange) about people having stray domain sockets lying around. However, if we make sure that the server created by the startServer() in the Instruments class is actually closed (forcing it to stop listening), then the socket goes away.

Interestingly, I had already stopped the leak of the Instruments itself, and thus the server should have been destroyed, but there is some indication that destroying the server created with a domain socket does not successfully remove the socket for listening and might leave it open. I'm going to do some more leak checking to make sure there isn't anything else showing up, but so far, the full set of functional tests indicate no fd leakage at least.

I'm re-running tests and then will put in my pull request.

gaige · 2013-02-14T23:06:37Z

#190 is now up for your reviewing pleasure. Please let me know if I've missed something. Otherwise, this should help the stability in low-mem and low-fd situations.

Fix #148 leaving domain sockets open

Handle initial orientation in caps

* Now that we use `apkAnalyzer` to parse activity names, it's not always going to be the case that an activity name will be "fully qualified" * We need to set retry to `true` again so that if the activity does not have '.' in front of it, try again with the dot

Handle initial orientation in caps

* Now that we use `apkAnalyzer` to parse activity names, it's not always going to be the case that an activity name will be "fully qualified" * We need to set retry to `true` again so that if the activity does not have '.' in front of it, try again with the dot

lock · 2019-05-04T22:43:27Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ghost assigned jlipps and santiycr Feb 12, 2013

jlipps closed this as completed Feb 12, 2013

jlipps mentioned this issue Feb 12, 2013

setValue for textFields doesn't generate full event stream #152

Closed

jlipps reopened this Feb 12, 2013

jlipps closed this as completed in ed432ee Feb 14, 2013

jlipps added a commit that referenced this issue Feb 14, 2013

Merge pull request #190 from gaige/fdfix

7bb3ee5

Fix #148 leaving domain sockets open

sebv pushed a commit to sebv/appium that referenced this issue May 9, 2014

ci ios branch for build appium#148.2

eddcb62

dpgraham pushed a commit to dpgraham/appium that referenced this issue Oct 1, 2018

Merge pull request appium#148 from appium/isaac-orientation

fcfa1f4

Handle initial orientation in caps

dpgraham pushed a commit to dpgraham/appium that referenced this issue Nov 7, 2018

Merge pull request appium#148 from appium/isaac-orientation

949f3fc

Handle initial orientation in caps

lock bot locked and limited conversation to collaborators May 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running entire suite of 'grunt functional' results in 'EMFILE, too many open files' error #148

Running entire suite of 'grunt functional' results in 'EMFILE, too many open files' error #148

michellai commented Feb 8, 2013

bootstraponline commented Feb 11, 2013

jlipps commented Feb 12, 2013

bootstraponline commented Feb 12, 2013

jlipps commented Feb 12, 2013

gaige commented Feb 12, 2013

jlipps commented Feb 12, 2013

bootstraponline commented Feb 12, 2013

gaige commented Feb 12, 2013

gaige commented Feb 12, 2013

gaige commented Feb 13, 2013

jlipps commented Feb 13, 2013

gaige commented Feb 13, 2013

michellai commented Feb 13, 2013

jlipps commented Feb 14, 2013

gaige commented Feb 14, 2013

jlipps commented Feb 14, 2013

gaige commented Feb 14, 2013

gaige commented Feb 14, 2013

lock bot commented May 4, 2019

Running entire suite of 'grunt functional' results in 'EMFILE, too many open files' error #148

Running entire suite of 'grunt functional' results in 'EMFILE, too many open files' error #148

Comments

michellai commented Feb 8, 2013

bootstraponline commented Feb 11, 2013

jlipps commented Feb 12, 2013

bootstraponline commented Feb 12, 2013

jlipps commented Feb 12, 2013

gaige commented Feb 12, 2013

jlipps commented Feb 12, 2013

bootstraponline commented Feb 12, 2013

gaige commented Feb 12, 2013

gaige commented Feb 12, 2013

gaige commented Feb 13, 2013

jlipps commented Feb 13, 2013

gaige commented Feb 13, 2013

michellai commented Feb 13, 2013

jlipps commented Feb 14, 2013

gaige commented Feb 14, 2013

jlipps commented Feb 14, 2013

gaige commented Feb 14, 2013

gaige commented Feb 14, 2013

lock bot commented May 4, 2019