-
Notifications
You must be signed in to change notification settings - Fork 711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiprocessing bug on macOS #159
Comments
Never mind, I thought that the difference in behavior between macOS and Linux meant that we could support even code that avoids the |
Re-opening this issue because I think we do want to address it in some way. Issue and root causeAdding a bit more context and explanation here. Consider the following simple demo program: import multiprocessing
def inc(x):
return x + 1
pool = multiprocessing.Pool(2)
print(pool.map(inc, [1, 2, 3, 4])) Linux This works on Linux, because multiprocessing on Linux uses the macOS On macOS, the behavior of this program depends on whether you use Python 2, Python 3.0-3.7, or Python 3.8+. Python 2's multiprocessing and Python 3.0-3.7's multiprocessing uses the Python 3.8+ on macOS uses the "spawn" start method by default. This creates a new process running the Python interpreter, and it imports the module, which is problematic if importing the module has side effects. We can override the default fork method on 3.8+ to make the above example work: import multiprocessing
def inc(x):
return x + 1
multiprocessing.set_start_method('fork')
pool = multiprocessing.Pool(2)
print(pool.map(inc, [1, 2, 3, 4])) Unfortunately, this is considered unsafe and can lead to crashes of the subprocess. The proper fix is to make importing the module side-effect free: import multiprocessing
def inc(x):
return x + 1
if __name__ == '__main__':
pool = multiprocessing.Pool(2)
print(pool.map(inc, [1, 2, 3, 4])) Windows Windows does not have a Where does this leave us?This issue — supporting multiprocessing in this way, where importing the module has side effects — seems unfixable. However, since multiple people have independently run into this problem, maybe we can do something better than just giving up. If we can detect that importing the module has side effects, perhaps we can give a better error message (and explain how to fix the issue), or perhaps we can disable multiprocessing (so that the user's program still works) and print out a warning that they can get a speedup by fixing the issue. |
When it is None, we could internally set |
That seems fine; it also seems okay to leave it as-is (it's been this way for a long time, and other libraries also have this issue). |
Agree - also seems ok to leave as is for now. |
User reported running into this problem in our Slack community: |
Report from another user (Robin) who encountered this issue in PyCharm, but the same code worked in a Jupyter Notebook. Their problem was solvable in two ways: either set
|
I can reproduce this issue on MacOS Ventura 13.5.2 (Apple M1) with python 3.10.12 when using the Datalab interface for finding issues. Example - import numpy as np
from cleanlab import Datalab
features = np.random.random((100, 2))
labels = np.array([0] * 50 + [1] * 50)
pred_probs = features / features.sum(axis=1, keepdims=True)
lab = Datalab(data={"features": features, "y": labels}, label_name="y")
lab.find_issues(features=features, pred_probs=pred_probs) Output
No problems when running on jupyter. @jwmueller's suggestion of wrapping the code works well. |
macOS 12.1 (on Intel hardware) + Python 3.9.10, cleanlab 3f17be6. Running the code below causes an infinite loop of exceptions.
To reproduce, save this as
bug.py
and run it:Output:
On a different platform (Debian 10, Linux 4.19.0-11-amd64, Python 3.7.3), this bug does not occur, and running the code produces the expected output:
The user can work around the issue with the following, but we should fix cleanlab so that the above version of the code works.
This code produces the expected output on macOS.
The text was updated successfully, but these errors were encountered: