Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

OS X + NumPy + Multiprocessing = ?? #206

Closed
evz opened this Issue · 13 comments

3 participants

@evz
Owner

It would seem that Apple’s implementation of BLAS does not support using BLAS calls on both sides of a fork. The upshot of this is that, when using NumPy functions that rely on BLAS calls within a forked process (such as ones created when you push a job into a multiprocessing pool) the fork might never actually fully exit. Which means you end up with orphaned processes until process that was originally forked exits (or at least that’s what it seems like is happening with Dedupe).

In the short term, it might seem like the way to go would be to disable multiprocessing for Apple users. In the longer term, perhaps we could instruct Apple users that they will need to install a different BLAS implementation (unless Apple fixes it, which is doubtful) if they encounter certain funny business would be in order.

@evz evz added the bug label
@fgregg
Owner

I think we should check to see if the user has an incompatible BLAS. If BLAS is incompatible, then we should declare the pool using multiprocessing.dummy.

If the user has an incompatible BLAS and asks for more than one process, we should also warn the user that we won't create additional processes.

@evz, thoughts?

@evz
Owner

Sounds like a sane approach. I guess I just want to make sure we're interpreting the output of that call to numpy.distutils.system_info correctly. When make that call on OS X I get:

>>> numpy.distutils.system_info.get_info('blas')
{'language': 'f77', 'libraries': ['blas'], 'library_dirs': ['/usr/lib']}

When I call that on the Ubuntu 12.04 server where we have the web deduper running I get:

>>> numpy.distutils.system_info.get_info('blas')
{}

I get the same thing when I make that call on the server using atlas or openblas as well. @fgregg what do you get on your local setup?

@fgregg
Owner
@evz
Owner

A particularly interesting thread, given that I can't actually get any output on the Ubuntu machine for what BLAS was used when compiling NumPy: numpy/numpy#3912

It would seem that NumPy ships with a slower, perhaps python, implementation of BLAS which it falls back on if the OS does not have one available. So, it is possible for someone (like me) to install NumPy without actually having that library present on their system.

@fgregg
Owner

Implemented fall back to multiprocessing.dummy if BLAS library linked with accelerate. @evz, let's also provide some guidance in the README.md and then I think we can close this issue.

@evz evz closed this in 5e96ed3
@malkomalko

Can't seem to get this to work. I first pip uninstalled numpy and Dedupe.

Tried following the installation guide:

export BLAS=None
pip install numpy
installed dedupe

Then I run the example:

python csv_example.py

/usr/local/Cellar/python/2.7.5/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Dedupe-0.5-py2.7-ma
cosx-10.8-x86_64.egg/dedupe/api.py:48: UserWarning: NumPy linked against 'Accelerate.framework'. Multiprocessing will b
e disabled. http://mail.scipy.org/pipermail/numpy-discussion/2012-August/063589.html

@fgregg
Owner

@evz can you look at this

@evz
Owner

@fgregg I think that's the expected behavior, right? If you're running Dedupe on OS X, we are disabling multiprocessing and you get that warning.

@malkomalko What happens after you get the warning?

@fgregg
Owner
@malkomalko

@fgregg that is correct. My understanding was that If I compiled numpy without BLAS that I would be able to take advantage of multiple cores and run in parallel.

If that is not the case, maybe the README should be updated?

@evz I was still able to run the first example you gave. I'm just trying to figure out if in fact we can take advantage of parallelism here as I'd like to test this on a semi big job.

Thanks for this piece of software btw, it's been very useful for me as a learning experience.

@fgregg
Owner

@malkomalko, it should be the case. You found a bug.

@fgregg
Owner

@malkomalko, would it be easy for you to join the freenode irc channel #dedupe

@evz
Owner

@malkomalko Well, after asking some advice over on #scipy, it would seem that there is really no reliable way of disabling BLAS support when compiling NumPy on OS X (and possibly not at all, the docs we were relying on are apparently rather old).

The upshot of this is, because of this bug, the easiest way to get parallel processing in Dedupe is to compile NumPy against the current edge of OpenBLAS (they have yet to release a version that fixes a bug similar to the one that we are up against in the OS X framework). We prepared some instructions for that in the wiki here: https://github.com/datamade/dedupe/wiki/OSX-Install-Notes.

In the meantime, I'll go ahead and update the README this new finding. Thanks for helping us uncover it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.