Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide clearer message for multiprocess error #2520

Closed
djhoese opened this issue Feb 9, 2019 · 5 comments
Closed

Provide clearer message for multiprocess error #2520

djhoese opened this issue Feb 9, 2019 · 5 comments
Labels
documentation Improve or add to documentation good first issue Clearly described and easy to accomplish. Good for beginners to the project.

Comments

@djhoese
Copy link

djhoese commented Feb 9, 2019

See discussion on #2515 for details. In summary, if a user tries to use the Client object or uses multiprocessing in an unexpected way (especially when first starting out) they can run in to this error:

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

The solution when this is encountered is typically making sure that your script is started in a if __name__ == '__main__':. If this doesn't apply to you then you usually have to do some fancier handling. Is there a way that the above exception can be caught by distributed/dask and provide a simpler or more clear message?

@mrocklin
Copy link
Member

mrocklin commented Mar 8, 2019

To be clear, this fails if run within a script (it works fine in an interpretter):

from dask.distributed import Client
client = Client()
# user code follows

The solution is this

from dask.distributed import Client

if __name__ == '__main__':
    client = Client()
    # user code follows

This is exactly the same problem that exists with anything in Python that spins up processes, like a multiprocessing.Pool()

Another alternative is to not use processes with Client(process=False), but that has other performance implications

tlozsolt added a commit to tlozsolt/TractionRheoscopy that referenced this issue Jul 28, 2020
Dont know if this will work, but found suggestion on:

dask/distributed#2520

which is exactly the problem I am harving. The code works when run in an interprettor and gives the stated error when run in a script.
keflavich added a commit to ALMA-IMF/reduction that referenced this issue Dec 1, 2020
gbaier added a commit to gbaier/nllrtv that referenced this issue Dec 16, 2020
@rgoggins
Copy link

rgoggins commented May 9, 2021

This works for me, I am curious though, why does this fix the issue?

@djhoese
Copy link
Author

djhoese commented May 9, 2021

@rgoggins This has to do with how the additional/child processes are created. Python has to "import" your script(s) in every child process. If you don't put initialization code (code that should only be run once) into the if __name__ == "__main__": block then it gets run for every child process (at "import" time). This can end up with an infinite recursion as each process creates child processes that create more child processes and so on.

This is how I understand it at least.

@GenevieveBuckley GenevieveBuckley added documentation Improve or add to documentation good first issue Clearly described and easy to accomplish. Good for beginners to the project. labels Oct 19, 2021
@GenevieveBuckley
Copy link
Contributor

I've seen several issues where people are running into this - where would be a good place to clarify the expected usage in the documentation?

@GenevieveBuckley
Copy link
Contributor

Ah - a discussion for that is already happening over at #2708

In that case, this issue can be closed as a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improve or add to documentation good first issue Clearly described and easy to accomplish. Good for beginners to the project.
Projects
None yet
Development

No branches or pull requests

4 participants