New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
randomstats with many iterations opens too many files #38
Comments
Thanks for reporting this. Setting the user file limit is an ugly, ugly solution, and I agree it's unlikely to be sufficient. I suspect this bug has something with the Cython-wrapped |
Wow, nefarious bug. It seems to have been cause by stdin/stdout/stderr filehandles not getting cleaned up by the subprocess.Popen instances created every time a BEDTools command is called. This is now fixed in 703f7e2 but only for Python 2.7 (see the commit comment at the bottom of that page for details). For now I'm keeping this in a separate branch until I figure out another way that works for both 2.6 and 2.7. If you're stuck on 2.6, you can redirect stderr to avoid seeing the crazy number of |
Hrmm. I'm getting a Sorry for the close... |
Yeah, if you install from First do this to get the new filehandle branch:
Then do this, which ought to have Cython generate the .cpp file:
And finally you can do:
Does that work? Also . . . I think I'd rather keep this issue open, since it isn't as elegant as a solution as I'd like. |
Ah, works perfectly now. I didn't mean to close the issue-- just GitHub's "Comment & Close" button placement... Makes sense about the Cython dependency. I haven't seen the setup.py and build.py separation before. |
The build works fine, but I'm getting a strange
I've also seen this show up as:
RAM is not the issue here-- usage is very low and not near the 12gb limit on this machine. Removing your try/catch in call_bedtools gives:
and suddenly the terminal that ran this process can't fork any more jobs
which makes it look like too many processes have been forked without cleanup...?
|
Hmm. Seems like more subprocess annoyances . . . this may take some time to debug. I'm worried pybedtools may be pushing subprocess in ways it wasn't designed for, at least for the As for the cleanup, things should automatically get cleaned up if everything exits normally (thanks to |
Still having trouble with the memory problem. However, while debugging I found that in Python 2.6, if I use a I'm not fluent with multiprocessing. I tried removing the
(probably good universal advice in addition to an assertion error) Do you know of a way of letting a Pool run with a single process? If so, this could fix the py2.6 problem so these changes could be merged into the master branch edit: never mind about the merging . . . everything would have to be run through a Pool to get this benefit. |
You'll get infinite recursion if you remove the conditional completely. The excellent error message is catching the Pool'ed process trying to create its own Pool (which would then create its own, etc). Better to do something like:
|
Ah, right. Thanks for the info. Since the memory issue, and general |
Short answer: I think all of these issues are fixed as of 1ddb49c. Can you please test? Long answer and notes to self: For file-based BedTools, the iterator is a Cython For stream-based BedTools, calling However, adding logic to
does not fix the problem. My guess is that this is due to circular references that need to be tracked down and broken (i.e., search So: the issue is fixed, but it will be nice to have |
Yeah the fix looks good to me, even on my larger files and 100's of thousands of iterations and several processes. Thanks for the update! |
I have two fairly small bed files (~1600 features in each, overlapping by ~500 features). I'd like to calculate the empirical overlap p-value but to get a decently small p-value, I need a lot of shuffles.
pybedtools is apparently opening a new file for each iteration (not sure how since clearly the files are unlinked and the python objects are deleted as soon as the intersect results are done. relevant code
Any way around this? I can increase the user limit for open files on this machine, but I doubt it will suffice for tens of millions of overlaps... Looks like it's crapping out at only 4541 files on this machine.
The text was updated successfully, but these errors were encountered: