Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running entire suite of 'grunt functional' results in 'EMFILE, too many open files' error #148

Closed
michellai opened this issue Feb 8, 2013 · 19 comments
Assignees
Labels
Bug a problem that needs fixing

Comments

@michellai
Copy link

The flick gesture test passes consistently when run via this specific command:
$ mocha -t 90000 -R spec test/functional/uicatalog/flick.js
flick gesture
✓ should work via webdriver method (4194ms)
✓ should work via mobile only method (4179ms)
✓ should work via mobile only method with percentage (4186ms)
...
8 tests complete (2 minutes)

However, when running the same tests as part of the large suite of 'grunt functional' tests, I am consistently blocked by this file handler error:
$ grunt functional
Running "functional" task

active
✓ should return active element (3319ms)
...
flick gesture
✓ should work via webdriver method (4235ms)
◦ should work via mobile only method:
/Users/mlai/dev/appium/node_modules/winston/lib/winston/exception.js:29
cwd: process.cwd(),
^
Error: EMFILE, too many open files
at Object.exception.getProcessInfo (/Users/mlai/dev/appium/node_modules/winston/lib/winston/exception.js:29:26)
at Object.exception.getAllInfo (/Users/mlai/dev/appium/node_modules/winston/lib/winston/exception.js:17:24)
at Logger._uncaughtException (/Users/mlai/dev/appium/node_modules/winston/lib/winston/logger.js:600:24)
at process.EventEmitter.emit (events.js:126:20)

Thanks to jlipps for confirming this issue as a problem with appium or one of its dependency failing to release Node's file handlers.

@bootstraponline
Copy link
Member

I ran into the same error on OS X when writing tests.

/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/exception.js:29
    cwd:         process.cwd(),
                         ^
Error: EMFILE, too many open files
    at Object.exception.getProcessInfo (/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/exception.js:29:26)
    at Object.exception.getAllInfo (/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/exception.js:17:24)
    at Logger._uncaughtException (/usr/local/lib/node_modules/appium/node_modules/winston/lib/winston/logger.js:600:24)
    at process.EventEmitter.emit (events.js:126:20)

@jlipps
Copy link
Member

jlipps commented Feb 12, 2013

Hi guys,

Try running this in your shell:

ulimit -n 4096

And then run the testsuite again to see if that helps! (Don't forget to pull and reset your npm dependencies...)

@ghost ghost assigned jlipps and santiycr Feb 12, 2013
@bootstraponline
Copy link
Member

Appium hasn't crashed yet after running that command.

@jlipps
Copy link
Member

jlipps commented Feb 12, 2013

I had to bump my ulimit to higher than 4096 for this to work, but it did eventually. My new theory is that it is instruments writing tracedirs with tons of files that causes these errors. The issue would then be that grunt is spawning many instruments sessions via appium, so that all these have the same parent process. Not sure if there's a clean way around this. For now, I'll keep bumping my ulimit and we can reopen this if we run into a tighter spot sometime in the future.

@gaige
Copy link
Contributor

gaige commented Feb 12, 2013

I have tracked this down to half-open sockets on our side (in particular /tmp/instruments). If you do lsof -p and the node.js process, you can see that hundreds are left around.

In particular, 'allowHalfOpen' is turned on when the sockets are listened to, but conn.end() is never called to close them up after the data is sent back.

I'm working on a patch on this, and I'll put in a pull request if it passes all of the tests.

@jlipps
Copy link
Member

jlipps commented Feb 12, 2013

Brilliant! I had some theories about instruments but didn't think about half-open sockets. When I wrote that code I wondered what the side effects might be :-) Your patch will be greatly appreciated.

@jlipps jlipps reopened this Feb 12, 2013
@bootstraponline
Copy link
Member

It just crashed again. I'm also looking forward to the fix.

@gaige
Copy link
Contributor

gaige commented Feb 12, 2013

Taking a little longer to completely remove it..... soon

@gaige
Copy link
Contributor

gaige commented Feb 12, 2013

Well, it's definitely a bit more pernicious than I thought it was, and it's actually a larger problem, since the sockets will sit around for as long as the server runs. It's starting to look like there is a long-term reference being held somewhere and it isn't being closed. So far, the only place where the connection appears to be held is in the cmd object in the command queue.

@gaige
Copy link
Contributor

gaige commented Feb 13, 2013

This is taking quite a bit longer than originally expected. The crux of the matter is some pretty major leakage. It appears to be leaking IOS and Instruments objects until the entire process exits (both by inspection of the open files with lsof and using nodetime and profiling to look for leaks. There are a lot of closures in here and it wouldn't surprise me if we had one somewhere that is retaining itself or otherwise creating a circular reference.

I'm going to continue trying to see if I can track this down.

@jlipps
Copy link
Member

jlipps commented Feb 13, 2013

Thanks @gaige! Let me know if you need any insight into how things are architected. Sounds like you've got things pretty well mapped out by now though.

@gaige
Copy link
Contributor

gaige commented Feb 13, 2013

At this point, I've definitely tracked down a few leaks and broken some cycles which reduce substantially the number of leaked pipes. However, there is still basically one fd leak per instantiation of Instruments, despite the fact that all of the Instruments and processes are now appropriately disposed of.

At this point, all Socket classes are clearing, so I'm now suspecting one of two things:

  • one side or the other needs to manually perform a socket operation on the pipe fd
  • there's something broken inside of node.js dealing with this scenario

I have seen comments on other threads about leaked file descriptors with unix domain sockets when trying to interface with mysql and postgres (making me think it might be a general problem), but no indication of any specific bug found or work-around.

I'm going to toy with this a little longer and then put up the fixes that I made for this, since they're at least reducing the problem quite a bit.

@michellai
Copy link
Author

Thanks so much for your diligence on this issue, Gaige- I really appreciate
it! Can't wait to pull the next latest.

On Wed, Feb 13, 2013 at 10:02 AM, Gaige B Paulsen
notifications@github.comwrote:

At this point, I've definitely tracked down a few leaks and broken some
cycles which reduce substantially the number of leaked pipes. However,
there is still basically one fd leak per instantiation of Instruments,
despite the fact that all of the Instruments and processes are now
appropriately disposed of.

At this point, all Socket classes are clearing, so I'm now suspecting one
of two things:

  • one side or the other needs to manually perform a socket operation
    on the pipe fd
  • there's something broken inside of node.js dealing with this scenario

I have seen comments on other threads about leaked file descriptors with
unix domain sockets when trying to interface with mysql and postgres
(making me think it might be a general problem), but no indication of any
specific bug found or work-around.

I'm going to toy with this a little longer and then put up the fixes that
I made for this, since they're at least reducing the problem quite a bit.


Reply to this email directly or view it on GitHubhttps://github.com//issues/148#issuecomment-13507893.

@jlipps
Copy link
Member

jlipps commented Feb 14, 2013

almost completely addressed in #187 which has been merged now. thanks @gaige!

@gaige
Copy link
Contributor

gaige commented Feb 14, 2013

HAH! I do believe I've finally tracked this puppy down.... more news in a minute, but so far I'm running functional and I've accumulated only the one open socket required to run it.

@jlipps
Copy link
Member

jlipps commented Feb 14, 2013

awesome @gaige!

On Feb 14, 2013, at 2:06 PM, Gaige B Paulsen notifications@github.com wrote:

HAH! I do believe I've finally tracked this puppy down.... more news in a minute, but so far I'm running functional and I've accumulated only the one open socket required to run it.


Reply to this email directly or view it on GitHub.

@gaige
Copy link
Contributor

gaige commented Feb 14, 2013

This may be a bug in the node.js handling of domain sockets in general, because I've seen a bunch of other posts in various places (other projects, stack exchange) about people having stray domain sockets lying around. However, if we make sure that the server created by the startServer() in the Instruments class is actually closed (forcing it to stop listening), then the socket goes away.

Interestingly, I had already stopped the leak of the Instruments itself, and thus the server should have been destroyed, but there is some indication that destroying the server created with a domain socket does not successfully remove the socket for listening and might leave it open. I'm going to do some more leak checking to make sure there isn't anything else showing up, but so far, the full set of functional tests indicate no fd leakage at least.

I'm re-running tests and then will put in my pull request.

@gaige
Copy link
Contributor

gaige commented Feb 14, 2013

#190 is now up for your reviewing pleasure. Please let me know if I've missed something. Otherwise, this should help the stability in low-mem and low-fd situations.

@jlipps jlipps closed this as completed in ed432ee Feb 14, 2013
jlipps added a commit that referenced this issue Feb 14, 2013
Fix #148 leaving domain sockets open
sebv pushed a commit to sebv/appium that referenced this issue May 9, 2014
dpgraham pushed a commit to dpgraham/appium that referenced this issue Oct 1, 2018
dpgraham added a commit to dpgraham/appium that referenced this issue Oct 1, 2018
* Now that we use `apkAnalyzer` to parse activity names, it's not always going to be the case that an activity name will be "fully qualified"
* We need to set retry to `true` again so that if the activity does not have '.' in front of it, try again with the dot
dpgraham pushed a commit to dpgraham/appium that referenced this issue Nov 7, 2018
dpgraham added a commit to dpgraham/appium that referenced this issue Nov 7, 2018
* Now that we use `apkAnalyzer` to parse activity names, it's not always going to be the case that an activity name will be "fully qualified"
* We need to set retry to `true` again so that if the activity does not have '.' in front of it, try again with the dot
@lock
Copy link

lock bot commented May 4, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators May 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug a problem that needs fixing
Projects
None yet
Development

No branches or pull requests

5 participants