-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OS X + NumPy + Multiprocessing = ?? #206
Comments
I think we should check to see if the user has an incompatible BLAS. If BLAS is incompatible, then we should declare the pool using multiprocessing.dummy. If the user has an incompatible BLAS and asks for more than one process, we should also warn the user that we won't create additional processes. @evz, thoughts? |
Sounds like a sane approach. I guess I just want to make sure we're interpreting the output of that call to >>> numpy.distutils.system_info.get_info('blas')
{'language': 'f77', 'libraries': ['blas'], 'library_dirs': ['/usr/lib']} When I call that on the Ubuntu 12.04 server where we have the web deduper running I get: >>> numpy.distutils.system_info.get_info('blas')
{} I get the same thing when I make that call on the server using |
sysinfo.get_info('blas') On Tue, Feb 25, 2014 at 2:17 PM, Eric van Zanten
773.888.2718 |
A particularly interesting thread, given that I can't actually get any output on the Ubuntu machine for what BLAS was used when compiling NumPy: numpy/numpy#3912 It would seem that NumPy ships with a slower, perhaps python, implementation of BLAS which it falls back on if the OS does not have one available. So, it is possible for someone (like me) to install NumPy without actually having that library present on their system. |
Implemented fall back to multiprocessing.dummy if BLAS library linked with accelerate. @evz, let's also provide some guidance in the README.md and then I think we can close this issue. |
Can't seem to get this to work. I first pip uninstalled numpy and Dedupe. Tried following the installation guide:
Then I run the example:
/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Dedupe-0.5-py2.7-ma |
@evz can you look at this |
@fgregg I think that's the expected behavior, right? If you're running Dedupe on OS X, we are disabling multiprocessing and you get that warning. @malkomalko What happens after you get the warning? |
@evz, @malkomalko said he installed numpy without BLAS support, so that On Fri, Feb 28, 2014 at 8:58 AM, Eric van Zanten
773.888.2718 |
@fgregg that is correct. My understanding was that If I compiled numpy without BLAS that I would be able to take advantage of multiple cores and run in parallel. If that is not the case, maybe the README should be updated? @evz I was still able to run the first example you gave. I'm just trying to figure out if in fact we can take advantage of parallelism here as I'd like to test this on a semi big job. Thanks for this piece of software btw, it's been very useful for me as a learning experience. |
@malkomalko, it should be the case. You found a bug. |
@malkomalko, would it be easy for you to join the freenode irc channel #dedupe |
@malkomalko Well, after asking some advice over on #scipy, it would seem that there is really no reliable way of disabling BLAS support when compiling NumPy on OS X (and possibly not at all, the docs we were relying on are apparently rather old). The upshot of this is, because of this bug, the easiest way to get parallel processing in Dedupe is to compile NumPy against the current edge of OpenBLAS (they have yet to release a version that fixes a bug similar to the one that we are up against in the OS X framework). We prepared some instructions for that in the wiki here: https://github.com/datamade/dedupe/wiki/OSX-Install-Notes. In the meantime, I'll go ahead and update the README this new finding. Thanks for helping us uncover it! |
It would seem that Apple’s implementation of BLAS does not support using BLAS calls on both sides of a fork. The upshot of this is that, when using NumPy functions that rely on BLAS calls within a forked process (such as ones created when you push a job into a multiprocessing pool) the fork might never actually fully exit. Which means you end up with orphaned processes until process that was originally forked exits (or at least that’s what it seems like is happening with Dedupe).
In the short term, it might seem like the way to go would be to disable multiprocessing for Apple users. In the longer term, perhaps we could instruct Apple users that they will need to install a different BLAS implementation (unless Apple fixes it, which is doubtful) if they encounter certain funny business would be in order.
The text was updated successfully, but these errors were encountered: