Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.io.IOException: error=24, Too many open files #141

Closed
rwest opened this issue Jan 29, 2011 · 17 comments
Closed

java.io.IOException: error=24, Too many open files #141

rwest opened this issue Jan 29, 2011 · 17 comments
Milestone

Comments

@rwest
Copy link
Member

rwest commented Jan 29, 2011

My QMTP job just died with the following exception:

Ea raised by 38.7 from 43.7 to dHrxn(298K)=82.4 kcal/mol.
Pre-existing successful quantum result for FZBISMRNOHHVPK-UHFFFAOYAQ (InChI=1/C3H4O6/c4-2(5)8-3(9-2)6-1-7-3/h4-5H,1H2) has been found. This log file will be used.
Point group: Cs
Thermo for FZBISMRNOHHVPK-UHFFFAOYAQ: -235.73   89.11   32.08   38.96   44.65   49.13   55.44   59.59   65.47   
HBI-based thermo for XGQQZMNHCWBJKP-UHFFFAOYAGmult4(InChI=1/C3HO6/c4-2(5)8-3(9-2)6-1-7-3/h1H/mult4): -82.81 88.8    29.35   35.0    39.52   42.95   47.52   50.25   53.63   
Created new species: C3HO6JJJ(14995)
Created new reaction: C2HO3J(1751) + CO3JJ(994) --> C3HO6JJJ(14995)
Error running cINChI-1: java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/cInChI-1" (in directory "InChI"): java.io.IOException: error=24, Too many open files
Pre-existing successful quantum result for YRLKOWLKIUXBTJ-UHFFFAOYAK (InChI=1/C3H4O6/c4-2(5)3(9-8-2)6-1-7-3/h4-5H,1H2) has been found. This log file will be used.
java.io.IOException: Cannot run program "python": java.io.IOException: error=24, Too many open files
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
    at java.lang.Runtime.exec(Runtime.java:610)
    at java.lang.Runtime.exec(Runtime.java:448)
    at java.lang.Runtime.exec(Runtime.java:345)
    at jing.chem.QMTP.getPM3MM4ThermoDataUsingCCLib(QMTP.java:1659)
    at jing.chem.QMTP.parseGaussianPM3(QMTP.java:1454)
    at jing.chem.QMTP.generateQMThermoData(QMTP.java:313)
    at jing.chem.QMTP.generateThermoData(QMTP.java:148)
    at jing.chem.ChemGraph.generateThermoData(ChemGraph.java:1300)
    at jing.chem.ChemGraph.getThermoData(ChemGraph.java:1795)
    at jing.chem.Species.findStablestThermoData(Species.java:338)
    at jing.chem.Species.(Species.java:118)
    at jing.chem.Species.make(Species.java:878)
    at jing.rxn.ReactionTemplate.reactTwoReactants(ReactionTemplate.java:1196)
    at jing.rxn.TemplateReactionGenerator.react(TemplateReactionGenerator.java:181)
    at jing.rxnSys.RateBasedPDepRME.addSpeciesToCore(RateBasedPDepRME.java:317)
    at jing.rxnSys.RateBasedPDepRME.enlargeReactionModel(RateBasedPDepRME.java:235)
    at jing.rxnSys.ReactionModelGenerator.enlargeReactionModel(ReactionModelGenerator.java:3963)
    at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1452)
    at RMG.main(RMG.java:57)
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files
    at java.lang.UNIXProcess.(UNIXProcess.java:164)
    at java.lang.ProcessImpl.start(ProcessImpl.java:81)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
    ... 19 more

Could it be that some of the QMTP code is forgetting to close files after it's read from them? On linux you can try running the command lsof to see a list of open files at any given time. Pipe this to grep to filter out lines of interest, eg. lsof | grep QMfiles. On pharos you may need to ssh into the node that is running the RMG calculation first.

@gmagoon
Copy link
Contributor

gmagoon commented Jan 29, 2011

That's what I was thinking too...I'll take a look tonight...

@gmagoon
Copy link
Contributor

gmagoon commented Jan 29, 2011

I'm running the example case right now and it doesn't seem to have more than 9 files open at a time (i.e. they don't seem to be accumulating)...but maybe it is more of an issue when reading in lots of preexisting QM files at once (maybe they are being automatically closed after a certain time frame and that is why I'm not noticing them in the run with all the calculations)...I'll explore further

@rwest
Copy link
Member Author

rwest commented Jan 29, 2011

It is certainly the case that my situation had very many pre-existing files. It also resumed from restate, IIRC.

@gmagoon
Copy link
Contributor

gmagoon commented Jan 29, 2011

Restarting with preexisting QMfiles, seems to have even fewer files open at once...at most 4. Though I suppose it should be 1 (or maybe 2) open at once though. I'll push a commit with some additional file closing lines and see if that helps.
Thanks for the lsof tips, by the way, Richard.

@gmagoon
Copy link
Contributor

gmagoon commented Jan 29, 2011

The additional file-closing lines seem to have noticeably helped...now when I rerun (reading in pre-existing results) there are no QMfiles open most of the time when I run "lsof | grep /QMfiles/" on the compute node, and occasionally there is one. I'm marking this closed by abb90e4 for the time being, but if the problem recurs, please let me know.

@rwest
Copy link
Member Author

rwest commented Feb 3, 2011

This just occurred again.

Thermo for RDISLAOLCYWJGD-UHFFFAOYAP: -83.68    100.93  33.74   40.27   45.57   49.76   55.8    59.88   65.76   
HBI-based thermo for NBUDTFKOWOCMPD-UHFFFAOYACmult5(InChI=1/C4O5/c5-1-2-7-9-4-3(6)8-4/mult5): 125.38    106.08  31.03   35.538.78   41.15   44.28   46.11   48.32   
Pre-existing successful quantum result for DTKHHSNVCNRSOS-UHFFFAOYAX (InChI=1/C4H4O5/c5-1-2-7-9-4-3(6)8-4/h3-6H) has been found. This log file will be used.
java.io.IOException: Cannot run program "python": java.io.IOException: error=24, Too many open files
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
    at java.lang.Runtime.exec(Runtime.java:610)
    at java.lang.Runtime.exec(Runtime.java:448)
    at java.lang.Runtime.exec(Runtime.java:345)
    at jing.chem.QMTP.getPM3MM4ThermoDataUsingCCLib(QMTP.java:1662)
    at jing.chem.QMTP.parseGaussianPM3(QMTP.java:1457)
    at jing.chem.QMTP.generateQMThermoData(QMTP.java:313)
    at jing.chem.QMTP.generateThermoData(QMTP.java:148)
    at jing.chem.ChemGraph.generateThermoData(ChemGraph.java:1300)
    at jing.chem.ChemGraph.getThermoData(ChemGraph.java:1795)
    at jing.chem.Species.findStablestThermoData(Species.java:345)
    at jing.chem.Species.(Species.java:118)
    at jing.chem.Species.make(Species.java:878)
    at jing.rxn.ReactionTemplate.reactTwoReactants(ReactionTemplate.java:1196)
    at jing.rxn.TemplateReactionGenerator.react(TemplateReactionGenerator.java:232)
    at jing.rxnSys.RateBasedPDepRME.addSpeciesToCore(RateBasedPDepRME.java:317)
    at jing.rxnSys.RateBasedPDepRME.enlargeReactionModel(RateBasedPDepRME.java:235)
    at jing.rxnSys.ReactionModelGenerator.enlargeReactionModel(ReactionModelGenerator.java:3963)
    at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1452)
    at RMG.main(RMG.java:57)
Caused by: java.io.IOException: java.io.IOException: error=24, Too many open files
    at java.lang.UNIXProcess.(UNIXProcess.java:164)
    at java.lang.ProcessImpl.start(ProcessImpl.java:81)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
    ... 19 more

RMG was at 62d1a55
The model core has 8147 reactions and 174 species.
The model edge has 18476 reactions and 15633 species.
There are 9279 files in my QMfiles folder.

@gmagoon
Copy link
Contributor

gmagoon commented Feb 3, 2011

OK, I think I narrowed down the problem...seems to be associated with pipe, which is accumulating instances as the run progresses:

gmagoon@node55:~$ lsof | grep pipe
java      14421     gmagoon    8w     FIFO    0,8      0t0    624645 pipe
java      14421     gmagoon   10w     FIFO    0,8      0t0    624657 pipe
java      14421     gmagoon   12w     FIFO    0,8      0t0    624671 pipe
java      14421     gmagoon   13r     FIFO    0,8      0t0    624672 pipe
java      14421     gmagoon   14w     FIFO    0,8      0t0    624690 pipe
java      14421     gmagoon   15r     FIFO    0,8      0t0    624673 pipe
java      14421     gmagoon   16r     FIFO    0,8      0t0    624691 pipe
java      14421     gmagoon   17w     FIFO    0,8      0t0    624709 pipe
java      14421     gmagoon   18r     FIFO    0,8      0t0    624692 pipe
java      14421     gmagoon   19r     FIFO    0,8      0t0    624710 pipe
java      14421     gmagoon   20w     FIFO    0,8      0t0    624761 pipe
java      14421     gmagoon   21r     FIFO    0,8      0t0    624711 pipe
java      14421     gmagoon   22w     FIFO    0,8      0t0    624762 pipe
java      14421     gmagoon   23r     FIFO    0,8      0t0    624786 pipe
java      14421     gmagoon   24w     FIFO    0,8      0t0    624763 pipe
java      14421     gmagoon   25r     FIFO    0,8      0t0    624787 pipe
java      14421     gmagoon   26w     FIFO    0,8      0t0    624814 pipe
java      14421     gmagoon   27r     FIFO    0,8      0t0    624788 pipe
java      14421     gmagoon   29r     FIFO    0,8      0t0    624845 pipe
java      14421     gmagoon   31w     FIFO    0,8      0t0    624913 pipe
java      14421     gmagoon   33r     FIFO    0,8      0t0    624948 pipe
java      14421     gmagoon   35r     FIFO    0,8      0t0    624985 pipe
java      14421     gmagoon   36w     FIFO    0,8      0t0    624986 pipe
java      14421     gmagoon   37r     FIFO    0,8      0t0    625027 pipe
java      14421     gmagoon   38w     FIFO    0,8      0t0    624987 pipe
java      14421     gmagoon   39r     FIFO    0,8      0t0    625028 pipe
java      14421     gmagoon   40w     FIFO    0,8      0t0    625069 pipe
java      14421     gmagoon   41r     FIFO    0,8      0t0    625029 pipe
java      14421     gmagoon   42w     FIFO    0,8      0t0    625070 pipe
java      14421     gmagoon   44r     FIFO    0,8      0t0    625071 pipe
g03       16381     gmagoon    2w     FIFO    0,8      0t0    625071 pipe
sh        16382     gmagoon    2w     FIFO    0,8      0t0    625071 pipe
l402.exe  16383     gmagoon    2w     FIFO    0,8      0t0    625071 pipe
lsof      16387     gmagoon    1w     FIFO    0,8      0t0    625237 pipe
lsof      16387     gmagoon    5w     FIFO    0,8      0t0    625245 pipe
lsof      16387     gmagoon    6r     FIFO    0,8      0t0    625246 pipe
grep      16388     gmagoon    0r     FIFO    0,8      0t0    625237 pipe
lsof      16389     gmagoon    4r     FIFO    0,8      0t0    625245 pipe
lsof      16389     gmagoon    7w     FIFO    0,8      0t0    625246 pipe
gmagoon@node55:~$ grep pipe temp.txt
java      14421     gmagoon    8w     FIFO    0,8      0t0    602047 pipe
java      14421     gmagoon   10w     FIFO    0,8      0t0    602062 pipe
java      14421     gmagoon   12w     FIFO    0,8      0t0    602076 pipe
java      14421     gmagoon   13r     FIFO    0,8      0t0    602077 pipe
java      14421     gmagoon   14w     FIFO    0,8      0t0    602092 pipe
java      14421     gmagoon   15r     FIFO    0,8      0t0    602078 pipe
java      14421     gmagoon   16r     FIFO    0,8      0t0    602093 pipe
java      14421     gmagoon   17w     FIFO    0,8      0t0    602111 pipe
java      14421     gmagoon   18r     FIFO    0,8      0t0    602094 pipe
java      14421     gmagoon   19r     FIFO    0,8      0t0    602112 pipe
java      14421     gmagoon   21r     FIFO    0,8      0t0    602113 pipe
g03       15064     gmagoon    2w     FIFO    0,8      0t0    602113 pipe
sh        15065     gmagoon    2w     FIFO    0,8      0t0    602113 pipe
l103.exe  15066     gmagoon    2w     FIFO    0,8      0t0    602113 pipe
lsof      15067     gmagoon    1w     FIFO    0,8      0t0    602152 pipe
lsof      15067     gmagoon    5w     FIFO    0,8      0t0    602160 pipe
lsof      15067     gmagoon    6r     FIFO    0,8      0t0    602161 pipe
grep      15068     gmagoon    0r     FIFO    0,8      0t0    602152 pipe
lsof      15069     gmagoon    4r     FIFO    0,8      0t0    602160 pipe
lsof      15069     gmagoon    7w     FIFO    0,8      0t0    602161 pipe

@gmagoon
Copy link
Contributor

gmagoon commented Feb 4, 2011

I tried to close readers in 00e6aa3 and fc7a3a1. Unfortunately, it didn't seem to have any noticeable effect on the behavior of "lsof | grep pipe" during a restart run; it's possible it has been fixed, but I don't see any evidence that these changes have had any effect. Any thoughts or ideas?

@gmagoon
Copy link
Contributor

gmagoon commented Mar 24, 2011

When was the last time this occurred for you, Richard? I haven't been having any problems in my recent runs and I'm wondering if this can be closed.

@rwest
Copy link
Member Author

rwest commented Mar 24, 2011

Haven't tried it in a while.

On Mar 24, 2011, at 10:23 AM, gmagoon wrote:

When was the last time this occurred for you, Richard? I haven't been having any problems in my recent runs and I'm wondering if this can be closed.

Reply to this email directly or view it on GitHub:
#141 (comment)

@rwest
Copy link
Member Author

rwest commented Apr 26, 2011

This just occurred again. Both /home/rwest/RMG-Java/bin/cInChI-1 and /home/rwest/RMG-Java/bin/GATPFit.exe failed to run because of "Too many open files"

Pre-existing successful MOPAC quantum result for HEOXSTBRHHYDGD-UHFFFAOYAQ (InChI=1/C6H10O4/c1-3(4(2)7)6(9)5(8)10-6/h3,5,8-9H,1-2H3) has been found. This log file will be used.
Point group: C1
Thermo for HEOXSTBRHHYDGD-UHFFFAOYAQ: -153.02   104.79  42.51   52.55   61.26   68.46   79.31   87.02   98.73
ERROR: Error running cINChI-1: java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/cInChI-1" (in directory "InChI"): java.io.IOException: error=24, Too many open files
HBI-based thermo for ZOQZHOVZVCMVDB-UHFFFAOYAUmult4(InChI=1/C6H7O4/c1-3(4(2)7)6(9)5(8)10-6/h3H,1-2H3/mult4): -0.1   104.48  39.78   48.59   56.13   62.28   71.39   77.68   86.89   
ERROR: jing.chem.NASAFittingException: Error in running GATPFit: jing.chem.GATPFitException: Error running GATPFit
java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/GATPFit.exe" (in directory "GATPFit"): java.io.IOException: error=24, Too many open files
To help diagnosis, writing GATPFit input to file GATPFit/INPUT.txt

    at jing.chem.GATPFit.generateNASAThermoData(GATPFit.java:251)
    at jing.chem.Species.generateNASAThermoData(Species.java:365)
    at jing.chem.Species.(Species.java:123)
    at jing.chem.Species.make(Species.java:877)
    at jing.rxn.ReactionTemplate.reactTwoReactants(ReactionTemplate.java:1197)
    at jing.rxn.TemplateReactionGenerator.react(TemplateReactionGenerator.java:216)
    at jing.rxnSys.RateBasedPDepRME.addSpeciesToCore(RateBasedPDepRME.java:316)
    at jing.rxnSys.RateBasedPDepRME.enlargeReactionModel(RateBasedPDepRME.java:236)
    at jing.rxnSys.ReactionModelGenerator.enlargeReactionModel(ReactionModelGenerator.java:3888)
    at jing.rxnSys.ReactionModelGenerator.modelGeneration(ReactionModelGenerator.java:1459)
    at RMG.main(RMG.java:96)

CRITICAL: Error in running GATPFit: jing.chem.GATPFitException: Error running GATPFit
java.io.IOException: Cannot run program "/home/rwest/RMG-Java/bin/GATPFit.exe" (in directory "GATPFit"): java.io.IOException: error=24, Too many open files
To help diagnosis, writing GATPFit input to file GATPFit/INPUT.txt

(This is job fb1df6dbf3776ce9327a3065445035a97b2d5fc4 in my MF results repository)

@gmagoon
Copy link
Contributor

gmagoon commented Apr 27, 2011

Do you think we need to open additional input streams and close them? That is the only other thing I can think of.
i.e. instead of just InputStream is = mopacProc.getErrorStream();, also have InputStream isInput = mopacProc.getInputStream()

@rwest
Copy link
Member Author

rwest commented Apr 28, 2011

Yes, we should certainly close them.
Something like

mopacProc.waitFor()
mopacProc.getInputStream().close(); 
mopacProc.getOutputStream().close(); 
mopacProc.getErrorStream().close();
is = null;

as described at http://www.coderanch.com/t/275612/Streams/java/java-io-IOException-Too-many
and/or http://stackoverflow.com/questions/3404284/ffmpeg-java-io-ioexception-error-24-too-many-open-files

@gmagoon
Copy link
Contributor

gmagoon commented Apr 28, 2011

Ah...good, thanks...so this is worth a shot then...I'll try it out. What is the need (if any) for the is=null line? I saw it in the links you sent but couldn't figure out why they had it.

@rwest
Copy link
Member Author

rwest commented Apr 28, 2011

My guess, make it easier for the garbage collector to clean up the process because there's no longer a reference to it. Probably in your case it'll go out of scope soon anyway.

gmagoon added a commit that referenced this issue Apr 28, 2011
@gmagoon
Copy link
Contributor

gmagoon commented Apr 28, 2011

OK, it looks like c56ff8d should address this as there are now only 6 RMG-related streams open at any time:

gmagoon@node16:~$ lsof | grep pipe
java      20995     gmagoon   12w     FIFO    0,8      0t0   3905225 pipe
java      20995     gmagoon   13r     FIFO    0,8      0t0   3905226 pipe
java      20995     gmagoon   15r     FIFO    0,8      0t0   3905227 pipe
MOPAC2009 21614     gmagoon    0r     FIFO    0,8      0t0   3905225 pipe
MOPAC2009 21614     gmagoon    1w     FIFO    0,8      0t0   3905226 pipe
MOPAC2009 21614     gmagoon    2w     FIFO    0,8      0t0   3905227 pipe
lsof      21615     gmagoon    1w     FIFO    0,8      0t0   3905244 pipe
lsof      21615     gmagoon    5w     FIFO    0,8      0t0   3905252 pipe
lsof      21615     gmagoon    6r     FIFO    0,8      0t0   3905253 pipe
grep      21616     gmagoon    0r     FIFO    0,8      0t0   3905244 pipe
lsof      21617     gmagoon    4r     FIFO    0,8      0t0   3905252 pipe
lsof      21617     gmagoon    7w     FIFO    0,8      0t0   3905253 pipe
gmagoon@node16:~$ lsof | grep pipe
java      20995     gmagoon   12w     FIFO    0,8      0t0   3905312 pipe
java      20995     gmagoon   13r     FIFO    0,8      0t0   3905313 pipe
java      20995     gmagoon   15r     FIFO    0,8      0t0   3905314 pipe
python    21619     gmagoon    0r     FIFO    0,8      0t0   3905312 pipe
python    21619     gmagoon    1w     FIFO    0,8      0t0   3905313 pipe
python    21619     gmagoon    2w     FIFO    0,8      0t0   3905314 pipe
lsof      21620     gmagoon    1w     FIFO    0,8      0t0   3905328 pipe
lsof      21620     gmagoon    5w     FIFO    0,8      0t0   3905336 pipe
lsof      21620     gmagoon    6r     FIFO    0,8      0t0   3905337 pipe
grep      21621     gmagoon    0r     FIFO    0,8      0t0   3905328 pipe
lsof      21622     gmagoon    4r     FIFO    0,8      0t0   3905336 pipe
lsof      21622     gmagoon    7w     FIFO    0,8      0t0   3905337 pipe

I'm marking this as closed, hopefully for good this time...

@gmagoon gmagoon closed this as completed Apr 28, 2011
@gmagoon
Copy link
Contributor

gmagoon commented Apr 28, 2011

I also made similar changes for non-QMTP processes in 492393f . Presumably issues with these hadn't cropped up yet because the frequency of Runtime calls is much lower without QMTP.

shamelmerchant pushed a commit to shamelmerchant/RMG-Java that referenced this issue Jun 23, 2011
shamelmerchant pushed a commit to shamelmerchant/RMG-Java that referenced this issue Jun 23, 2011
trying to address issue ReactionMechanismGenerator#141; investigation suggested an excess of instances of "pipe" in lsof
shamelmerchant pushed a commit to shamelmerchant/RMG-Java that referenced this issue Jun 23, 2011
still trying to address issue ReactionMechanismGenerator#141; investigation suggested an excess of instances of "pipe" in lsof
cf. 00e6aa3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants