Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: multiprocessing scheduler silently hangs with lock task argument #1342

Closed
nirizr opened this issue Jun 29, 2016 · 5 comments · Fixed by #1343
Closed

BUG: multiprocessing scheduler silently hangs with lock task argument #1342

nirizr opened this issue Jun 29, 2016 · 5 comments · Fixed by #1343
Milestone

Comments

@nirizr
Copy link
Contributor

nirizr commented Jun 29, 2016

When passing a multiprocessing lock and using the multiprocessing scheduler in a dask graph (without even using the lock) dask silently hangs when multiprocessing.get is called.

PR #1341 adds a xfail test that proves this point

@jcrist
Copy link
Member

jcrist commented Jun 29, 2016

Multiprocessing locks aren't serializable, but cloudpickle.dumps(lock), won't fail (although it probably should). It will fail on load though, which tears down the worker process, causing the scheduler to hang.

What is your expected behavior here? A lock exists only in a process - serializing it between processes (if that worked) wouldn't have inter-process locking. A few options:

If you only need locking in each process, but don't care if many processes access the same resource concurrently, then a small wrapper class around multiprocessing.Lock that creates a new lock when unpickled would probably work.

@nirizr
Copy link
Contributor Author

nirizr commented Jun 29, 2016

Thanks of the in-depth explanation!

This became an issue in #1293, and the desired behavior there was to lock multiple processes from using the same resource (an hdf file) simultaneously.
@mrocklin already suggested locket, but unfortunately locket uses threading.Lock which isn't serializable and fails when used with python's multiprocessing module.

AFAIK using multiprocessing.Lock is supposed to be share-able between multiple processes by using a single global lock instead of serializing it (see this stackoverflow question) which is what i did in a workaround in #1293. I now see your recommendation of using multiprocessing.Manager().Lock() works, so i'll use that instead of a global lock. Thanks!

The purpose of reporting this issue is mostly to make this problem visible and documented if it ever comes up, and maybe adding a explicit exception or warning to the code in the future.

@jcrist
Copy link
Member

jcrist commented Jun 29, 2016

unfortunately locket uses threading.Lock which isn't serializable and fails when used with python's multiprocessing module.

That's a good point. If you serialize the lock file path, and create a new locket.lock_file on the other end then this would work. Interesting that they don't serialize though.

I fixed the deadlock problem in #1343, so the scheduler won't lock up on deserialization errors.

@nirizr
Copy link
Contributor Author

nirizr commented Jun 29, 2016

Thanks for the extremely fast fix!

This makes #1341 redundant, right? Should I close or would you like to keep the tests anyway?

@mrocklin
Copy link
Member

We could ping the locket guy. He's pretty responsive.

@sinhrks sinhrks added this to the 0.10.1 milestone Jul 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants