Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

re-usable pools for random_op methods #69

Closed
daler opened this issue Dec 4, 2012 · 3 comments
Closed

re-usable pools for random_op methods #69

daler opened this issue Dec 4, 2012 · 3 comments

Comments

@daler
Copy link
Owner

daler commented Dec 4, 2012

in the end, should look something like this:

mypool = multiprocessing.Pool(25)
bt.randomstats(_orig_pool=mypool, *args, **kwargs)
bt.random_op(_orig_pool=mypool, *args, **kwargs)
bt.random_jaccard(_orig_pool=mypool, *args, **kwargs)
@brentp
Copy link
Contributor

brentp commented Dec 4, 2012

some other suggestions after using this a bit more:

  1. random_op might also be better named as something like parallel_apply or whatever, since there's no actual randomness (this confused me when i first found this method)
  2. maybe you could allow specifying a reduce function so that it could take all the results and return a summary (like randomstats).
  3. is there a way to get around the wrapper functions in stats.py to allow methods more directly? e.g. if func is a method then automatically create a wrapper?
    new_func = function(self, other, **kwargs){
        return getattr(self, func.__name__)(other, **kwargs)
    }

(or something like that --completely untested)

@daler
Copy link
Owner Author

daler commented Dec 4, 2012

  1. good point
  2. also good point
  3. i think this will be complicated, see below.

Class methods are not well supported (maybe not supported at all?) across process boundaries because they can't be pickled (as pool.apply() complains). I don't know the details, but basically I found that you can only pass functions. I haven't tested the wrapping of a method in a function as you suggest though . . .

Also, class variables do not share state across processes. Importantly, this includes BedTool._TEMPFILES. I was getting all sorts of strange behavior using multiprocessing and pybedtools' existing auto-handling of temp files, and files were not being completely cleaned up.

So stats.py was my attempt to address both of these issues by having functions work on instances passed to the process (via the func args) and by being careful about cleaning up tempfiles.

Anyway, I agree that something like this would be useful but I think it will take some playing around with.

@daler
Copy link
Owner Author

daler commented May 4, 2013

Thanks to your suggestions, I made a new, general way of applying any arbitrary BedTool method many times in parallel -- see 3f3673c. Eventually, I'd like to deprecate randomstats and the stuff in stats.py in favor of this since it's 1) a lot cleaner and 2) a lot more general.

@daler daler closed this as completed May 4, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants