-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
java.io.IOException: error=24, Too many open files #141
Comments
That's what I was thinking too...I'll take a look tonight... |
I'm running the example case right now and it doesn't seem to have more than 9 files open at a time (i.e. they don't seem to be accumulating)...but maybe it is more of an issue when reading in lots of preexisting QM files at once (maybe they are being automatically closed after a certain time frame and that is why I'm not noticing them in the run with all the calculations)...I'll explore further |
It is certainly the case that my situation had very many pre-existing files. It also resumed from restate, IIRC. |
Restarting with preexisting QMfiles, seems to have even fewer files open at once...at most 4. Though I suppose it should be 1 (or maybe 2) open at once though. I'll push a commit with some additional file closing lines and see if that helps. |
The additional file-closing lines seem to have noticeably helped...now when I rerun (reading in pre-existing results) there are no QMfiles open most of the time when I run "lsof | grep /QMfiles/" on the compute node, and occasionally there is one. I'm marking this closed by abb90e4 for the time being, but if the problem recurs, please let me know. |
This just occurred again. Thermo for RDISLAOLCYWJGD-UHFFFAOYAP: -83.68 100.93 33.74 40.27 45.57 49.76 55.8 59.88 65.76 HBI-based thermo for NBUDTFKOWOCMPD-UHFFFAOYACmult5(InChI=1/C4O5/c5-1-2-7-9-4-3(6)8-4/mult5): 125.38 106.08 31.03 35.538.78 41.15 44.28 46.11 48.32 Pre-existing successful quantum result for DTKHHSNVCNRSOS-UHFFFAOYAX (InChI=1/C4H4O5/c5-1-2-7-9-4-3(6)8-4/h3-6H) has been found. This log file will be used. java.io.IOException: Cannot run program "python": java.io.IOException: error=24, Too many open files at java.lang.ProcessBuilder.start(ProcessBuilder.java:475) at java.lang.Runtime.exec(Runtime.java:610) at java.lang.Runtime.exec(Runtime.java:448) at java.lang.Runtime.exec(Runtime.java:345) at jing.chem.QMTP.getPM3MM4ThermoDataUsingCCLib(QMTP.java:1662) at jing.chem.QMTP.parseGaussianPM3(QMTP.java:1457) at jing.chem.QMTP.generateQMThermoData(QMTP.java:313) at jing.chem.QMTP.generateThermoData(QMTP.java:148) at jing.chem.ChemGraph.generateThermoData(ChemGraph.java:1300) at jing.chem.ChemGraph.getThermoData(ChemGraph.java:1795) at jing.chem.Species.findStablestThermoData(Species.java:345) at jing.chem.Species.(Species.java:118) at jing.chem.Species.make(Species.java:878) at jing.rxn.ReactionTemplate.reactTwoReactants(ReactionTemplate.java:1196) at jing.rxn.TemplateReactionGenerator.react(TemplateReactionGenerator.java:232) at jing.rxnSys.RateBasedPDepRME.addSpeciesToCore(RateBasedPDepRME.java:317) at jing.rxnSys.RateBasedPDepRME.enlargeReactionModel(RateBasedPDepRME.java:235) at jing.rxnSys.ReactionModelGenerator.enlargeReactionModel(ReactionModelGenerator.java:3963) at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1452) at RMG.main(RMG.java:57) Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files at java.lang.UNIXProcess.(UNIXProcess.java:164) at java.lang.ProcessImpl.start(ProcessImpl.java:81) at java.lang.ProcessBuilder.start(ProcessBuilder.java:468) ... 19 more RMG was at 62d1a55 |
OK, I think I narrowed down the problem...seems to be associated with pipe, which is accumulating instances as the run progresses: gmagoon@node55:~$ lsof | grep pipe java 14421 gmagoon 8w FIFO 0,8 0t0 624645 pipe java 14421 gmagoon 10w FIFO 0,8 0t0 624657 pipe java 14421 gmagoon 12w FIFO 0,8 0t0 624671 pipe java 14421 gmagoon 13r FIFO 0,8 0t0 624672 pipe java 14421 gmagoon 14w FIFO 0,8 0t0 624690 pipe java 14421 gmagoon 15r FIFO 0,8 0t0 624673 pipe java 14421 gmagoon 16r FIFO 0,8 0t0 624691 pipe java 14421 gmagoon 17w FIFO 0,8 0t0 624709 pipe java 14421 gmagoon 18r FIFO 0,8 0t0 624692 pipe java 14421 gmagoon 19r FIFO 0,8 0t0 624710 pipe java 14421 gmagoon 20w FIFO 0,8 0t0 624761 pipe java 14421 gmagoon 21r FIFO 0,8 0t0 624711 pipe java 14421 gmagoon 22w FIFO 0,8 0t0 624762 pipe java 14421 gmagoon 23r FIFO 0,8 0t0 624786 pipe java 14421 gmagoon 24w FIFO 0,8 0t0 624763 pipe java 14421 gmagoon 25r FIFO 0,8 0t0 624787 pipe java 14421 gmagoon 26w FIFO 0,8 0t0 624814 pipe java 14421 gmagoon 27r FIFO 0,8 0t0 624788 pipe java 14421 gmagoon 29r FIFO 0,8 0t0 624845 pipe java 14421 gmagoon 31w FIFO 0,8 0t0 624913 pipe java 14421 gmagoon 33r FIFO 0,8 0t0 624948 pipe java 14421 gmagoon 35r FIFO 0,8 0t0 624985 pipe java 14421 gmagoon 36w FIFO 0,8 0t0 624986 pipe java 14421 gmagoon 37r FIFO 0,8 0t0 625027 pipe java 14421 gmagoon 38w FIFO 0,8 0t0 624987 pipe java 14421 gmagoon 39r FIFO 0,8 0t0 625028 pipe java 14421 gmagoon 40w FIFO 0,8 0t0 625069 pipe java 14421 gmagoon 41r FIFO 0,8 0t0 625029 pipe java 14421 gmagoon 42w FIFO 0,8 0t0 625070 pipe java 14421 gmagoon 44r FIFO 0,8 0t0 625071 pipe g03 16381 gmagoon 2w FIFO 0,8 0t0 625071 pipe sh 16382 gmagoon 2w FIFO 0,8 0t0 625071 pipe l402.exe 16383 gmagoon 2w FIFO 0,8 0t0 625071 pipe lsof 16387 gmagoon 1w FIFO 0,8 0t0 625237 pipe lsof 16387 gmagoon 5w FIFO 0,8 0t0 625245 pipe lsof 16387 gmagoon 6r FIFO 0,8 0t0 625246 pipe grep 16388 gmagoon 0r FIFO 0,8 0t0 625237 pipe lsof 16389 gmagoon 4r FIFO 0,8 0t0 625245 pipe lsof 16389 gmagoon 7w FIFO 0,8 0t0 625246 pipe gmagoon@node55:~$ grep pipe temp.txt java 14421 gmagoon 8w FIFO 0,8 0t0 602047 pipe java 14421 gmagoon 10w FIFO 0,8 0t0 602062 pipe java 14421 gmagoon 12w FIFO 0,8 0t0 602076 pipe java 14421 gmagoon 13r FIFO 0,8 0t0 602077 pipe java 14421 gmagoon 14w FIFO 0,8 0t0 602092 pipe java 14421 gmagoon 15r FIFO 0,8 0t0 602078 pipe java 14421 gmagoon 16r FIFO 0,8 0t0 602093 pipe java 14421 gmagoon 17w FIFO 0,8 0t0 602111 pipe java 14421 gmagoon 18r FIFO 0,8 0t0 602094 pipe java 14421 gmagoon 19r FIFO 0,8 0t0 602112 pipe java 14421 gmagoon 21r FIFO 0,8 0t0 602113 pipe g03 15064 gmagoon 2w FIFO 0,8 0t0 602113 pipe sh 15065 gmagoon 2w FIFO 0,8 0t0 602113 pipe l103.exe 15066 gmagoon 2w FIFO 0,8 0t0 602113 pipe lsof 15067 gmagoon 1w FIFO 0,8 0t0 602152 pipe lsof 15067 gmagoon 5w FIFO 0,8 0t0 602160 pipe lsof 15067 gmagoon 6r FIFO 0,8 0t0 602161 pipe grep 15068 gmagoon 0r FIFO 0,8 0t0 602152 pipe lsof 15069 gmagoon 4r FIFO 0,8 0t0 602160 pipe lsof 15069 gmagoon 7w FIFO 0,8 0t0 602161 pipe |
When was the last time this occurred for you, Richard? I haven't been having any problems in my recent runs and I'm wondering if this can be closed. |
Haven't tried it in a while. On Mar 24, 2011, at 10:23 AM, gmagoon wrote:
|
This just occurred again. Both Pre-existing successful MOPAC quantum result for HEOXSTBRHHYDGD-UHFFFAOYAQ (InChI=1/C6H10O4/c1-3(4(2)7)6(9)5(8)10-6/h3,5,8-9H,1-2H3) has been found. This log file will be used. Point group: C1 Thermo for HEOXSTBRHHYDGD-UHFFFAOYAQ: -153.02 104.79 42.51 52.55 61.26 68.46 79.31 87.02 98.73 ERROR: Error running cINChI-1: java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/cInChI-1" (in directory "InChI"): java.io.IOException: error=24, Too many open files HBI-based thermo for ZOQZHOVZVCMVDB-UHFFFAOYAUmult4(InChI=1/C6H7O4/c1-3(4(2)7)6(9)5(8)10-6/h3H,1-2H3/mult4): -0.1 104.48 39.78 48.59 56.13 62.28 71.39 77.68 86.89 ERROR: jing.chem.NASAFittingException: Error in running GATPFit: jing.chem.GATPFitException: Error running GATPFit java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/GATPFit.exe" (in directory "GATPFit"): java.io.IOException: error=24, Too many open files To help diagnosis, writing GATPFit input to file GATPFit/INPUT.txt at jing.chem.GATPFit.generateNASAThermoData(GATPFit.java:251) at jing.chem.Species.generateNASAThermoData(Species.java:365) at jing.chem.Species.(Species.java:123) at jing.chem.Species.make(Species.java:877) at jing.rxn.ReactionTemplate.reactTwoReactants(ReactionTemplate.java:1197) at jing.rxn.TemplateReactionGenerator.react(TemplateReactionGenerator.java:216) at jing.rxnSys.RateBasedPDepRME.addSpeciesToCore(RateBasedPDepRME.java:316) at jing.rxnSys.RateBasedPDepRME.enlargeReactionModel(RateBasedPDepRME.java:236) at jing.rxnSys.ReactionModelGenerator.enlargeReactionModel(ReactionModelGenerator.java:3888) at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1459) at RMG.main(RMG.java:96) CRITICAL: Error in running GATPFit: jing.chem.GATPFitException: Error running GATPFit java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/GATPFit.exe" (in directory "GATPFit"): java.io.IOException: error=24, Too many open files To help diagnosis, writing GATPFit input to file GATPFit/INPUT.txt (This is job fb1df6dbf3776ce9327a3065445035a97b2d5fc4 in my MF results repository) |
Do you think we need to open additional input streams and close them? That is the only other thing I can think of. |
Yes, we should certainly close them. mopacProc.waitFor()
mopacProc.getInputStream().close();
mopacProc.getOutputStream().close();
mopacProc.getErrorStream().close();
is = null; as described at http://www.coderanch.com/t/275612/Streams/java/java-io-IOException-Too-many |
Ah...good, thanks...so this is worth a shot then...I'll try it out. What is the need (if any) for the is=null line? I saw it in the links you sent but couldn't figure out why they had it. |
My guess, make it easier for the garbage collector to clean up the process because there's no longer a reference to it. Probably in your case it'll go out of scope soon anyway. |
OK, it looks like c56ff8d should address this as there are now only 6 RMG-related streams open at any time: gmagoon@node16:~$ lsof | grep pipe java 20995 gmagoon 12w FIFO 0,8 0t0 3905225 pipe java 20995 gmagoon 13r FIFO 0,8 0t0 3905226 pipe java 20995 gmagoon 15r FIFO 0,8 0t0 3905227 pipe MOPAC2009 21614 gmagoon 0r FIFO 0,8 0t0 3905225 pipe MOPAC2009 21614 gmagoon 1w FIFO 0,8 0t0 3905226 pipe MOPAC2009 21614 gmagoon 2w FIFO 0,8 0t0 3905227 pipe lsof 21615 gmagoon 1w FIFO 0,8 0t0 3905244 pipe lsof 21615 gmagoon 5w FIFO 0,8 0t0 3905252 pipe lsof 21615 gmagoon 6r FIFO 0,8 0t0 3905253 pipe grep 21616 gmagoon 0r FIFO 0,8 0t0 3905244 pipe lsof 21617 gmagoon 4r FIFO 0,8 0t0 3905252 pipe lsof 21617 gmagoon 7w FIFO 0,8 0t0 3905253 pipe gmagoon@node16:~$ lsof | grep pipe java 20995 gmagoon 12w FIFO 0,8 0t0 3905312 pipe java 20995 gmagoon 13r FIFO 0,8 0t0 3905313 pipe java 20995 gmagoon 15r FIFO 0,8 0t0 3905314 pipe python 21619 gmagoon 0r FIFO 0,8 0t0 3905312 pipe python 21619 gmagoon 1w FIFO 0,8 0t0 3905313 pipe python 21619 gmagoon 2w FIFO 0,8 0t0 3905314 pipe lsof 21620 gmagoon 1w FIFO 0,8 0t0 3905328 pipe lsof 21620 gmagoon 5w FIFO 0,8 0t0 3905336 pipe lsof 21620 gmagoon 6r FIFO 0,8 0t0 3905337 pipe grep 21621 gmagoon 0r FIFO 0,8 0t0 3905328 pipe lsof 21622 gmagoon 4r FIFO 0,8 0t0 3905336 pipe lsof 21622 gmagoon 7w FIFO 0,8 0t0 3905337 pipe I'm marking this as closed, hopefully for good this time... |
I also made similar changes for non-QMTP processes in 492393f . Presumably issues with these hadn't cropped up yet because the frequency of Runtime calls is much lower without QMTP. |
an attempt to fix issue ReactionMechanismGenerator#141
trying to address issue ReactionMechanismGenerator#141; investigation suggested an excess of instances of "pipe" in lsof
still trying to address issue ReactionMechanismGenerator#141; investigation suggested an excess of instances of "pipe" in lsof cf. 00e6aa3
My QMTP job just died with the following exception:
Could it be that some of the QMTP code is forgetting to close files after it's read from them? On linux you can try running the command
lsof
to see a list of open files at any given time. Pipe this to grep to filter out lines of interest, eg.lsof | grep QMfiles
. On pharos you may need to ssh into the node that is running the RMG calculation first.The text was updated successfully, but these errors were encountered: