Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check/set multi-processing start method in patpass #347

Merged
merged 12 commits into from
Jan 18, 2022

Conversation

swhite2401
Copy link
Contributor

This branch fixes a problem that was found by @simoneliuzzo when running python 3.9 and MACOS 12.0.1: python mulitprocessing uses "spawn" as default start method which causes the following Runtime error:

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

This error is not clear to me as freeze_support() is supposed to be acitve only for Windows...

Nevertheless, for now I can propose the following:
-linux change nothing it is working
-MACOS: force "fork" start_method
-Windows: return a warning and run single process lattice_pass

This may not be the best approach but it is very difficult for me to validate this as I have no way of testing either MAC or Windows execution.

Help would be most appreciated!

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

I don't want to default the start method to 'fork' on MacOS since it's discouraged by python. I never got the error mentioned by @simoneliuzzo, so I think we should first try and understand in which conditions this error occurs. The fact that freeze_support does not act on MacOS suggests a python bug. Do you get this error with different python versions?

In addition, as of python 3.10, the fork method is not available any more on MacOS…

In the mean time, the workaround you found may be applied by the user if necessary, and it could be suggested in the patpass documentation, but clearly mentioning the python warning:

"Changed in version 3.8: On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725."

More globally, the startup of subprocesses in python seems to evolve rapidly, so I think it would be safer not to rely on the specificity of the fork method, and have code with runs with any start method…

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

In addition, as of python 3.10, the fork method is not available any more on MacOS…

Sorry, it's not so clear from the documentation, depending whether MacOS is considered Unix or not…

@swhite2401
Copy link
Contributor Author

I don't understand everything I must admit but python switched from fork to spawn for macos at version 3.8, since you never got the error could you check the start method you are using? multiprocessing.get_start_method()

For all the test I did using spawn produced errors or did not provide any speed up, so it is basically useless. I do not see any interest in providing a default behavior that does not work.

It can be stated in the help that the default is fork, with the warning you mention.

@swhite2401
Copy link
Contributor Author

In addition, as of python 3.10, the fork method is not available any more on MacOS…

Sorry, it's not so clear from the documentation, depending whether MacOS is considered Unix or not…

The only thing I could find (also in python 3.10 docs) is for python 3.8+ the default method switched from "fork" to "spawn" for MACOS, my feeling is that you did not get error because you were using fork

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

@swhite2401: Do you have an example (simple enough) triggering the problem? I cannot reproduce it, neither on python 3.8 nor 3.9 and spawn method…

@swhite2401
Copy link
Contributor Author

import at
import numpy
from at.tracking import patpass 

path = '../lattice/'
filename = 'S28F.mat'
key = 'LOW_EMIT_RING_INJ'
latticef = path+filename
ring = at.load_lattice(latticef,key=key)
ring.radiation_off()


parts=numpy.zeros((6,200))
patpass(ring,numpy.asfortranarray(parts),nturns=1024)

Replacing line 50 in this version of patpass by:
with multiprocessing.get_context('spawn').Pool(pool_size) as pool:

Using linux with python3.8 on grappa will give this error

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

@swhite2401: I only have my laptop here, so I run the tests with only 2 cores. But for the hmba test lattice, 500 particles, I do get a factor ~2 improvement with patpass. And no error

MacOS 12.5, python 3.9, AT on the master branch.

@swhite2401
Copy link
Contributor Author

I can't give you any better without a mac computer...

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

Do you also get the error on linux ? I thought it was only Windows and MacOS?

@swhite2401
Copy link
Contributor Author

Yes on linux forcing the "spawn" start method I get it

@swhite2401
Copy link
Contributor Author

I found a workaround calling atpass inside the block:

if __name__ == '__main__':

but is not an option for an at module...

@swhite2401
Copy link
Contributor Author

@swhite2401: I only have my laptop here, so I run the tests with only 2 cores. But for the hmba test lattice, 500 particles, I do get a factor ~2 improvement with patpass. And no error

MacOS 12.5, python 3.9, AT on the master branch.

@simoneliuzzo got the error on MacOS 12.0.1 and python 3.9... not sure if this explains why you do not get it

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

Instead of using a __main__ block, you could also call set_start_method() in at.__init__.py. Unless set_start_method() is called anywhere else during a python run, it will be called only once, at startup.

For tests, you can just call set_start_method() once immediately after importing at.
I do that successfully on macOS, but of course to switch from 'spawn' to 'fork' (again without error).

However, your Linux example is wrong, because patpass as it is checks the platform to decide the strategy, instead of checking the start method. 'spawn' on linux cannot work. This has to be changed!

I recall that GitHub runs a test of patpass, for all python versions and all platforms, on each push, and it never fails. So it could be a problem with your python installation, or a wrong use somewhere.

Again, without a example of failure of the master branch of AT, I do not want to change anything. And only @simoneliuzzo can provide that…

@swhite2401
Copy link
Contributor Author

Well for linux there was this global variable issue as well, this is why we decided to put this check on the OS to start with, but it is true this has to be cleaned up once we know exactly what to do

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

Easy modification.
patpass line 46, change:

if platform.startswith('linux')

with

if (get_start_method() == 'fork')

Only 'fork' allows inheriting global variables, independent of platform.

@swhite2401
Copy link
Contributor Author

Easy modification. patpass line 46, change:

if platform.startswith('linux')

with

if (get_start_method() == 'fork')

Only 'fork' allows inheriting global variables, independent of platform.

Why did we allow it only for linux in the first place? Was it because of this unsafe warning for MACOS?

@swhite2401
Copy link
Contributor Author

Also this may be a stupid question but why is spawn working for MACOS and not linux, I do not see any reason for that...

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

I think we just assumed "linux <=> fork", which is the default, but is not always true!
And spawn will work on linux if we do not try to use global variables with it, as it is the case now.

@swhite2401
Copy link
Contributor Author

I think we just assumed "linux <=> fork", which is the default, but is not always true! And spawn will work on linux if we do not try to use global variables with it, as it is the case now.

It doesn't, I get the following error:

RuntimeError: context has already been set

and this is normal because spawn re-instanciates the full python for each process, this is why you have to guard the part the launches the multiprocessing inside if __name__ == '__main__': and same goes for set_start_method()

I have no idea why this works on your MAC...

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

Forget contexts and __main__. Make the modification in patpass, start a new python session and call set_start_method('spawn') right after importing at.

@swhite2401
Copy link
Contributor Author

set_start_method('spawn')

Same error, context already set...

@swhite2401
Copy link
Contributor Author

Some news on test cases with spawn:
1-open a python session and copy the code inside the session OK
2-run a python script (python test.py) with call to patpass guarded by if __name__ = '__main__' OK
3-run a python script (python test.py) without guard NOT OK

Not sure how the git tests work... is it equivalent to 1 or 2?

For sure the at code should be working in all 3 conditions without constraints because users may want to use either one of them.

For information, fork method for my example is 3x faster than spawn.

The initial proposal to have 'fork' as default for MAC and Linux is therefore well justified in my opinion (faster, easier). In case 'spawn' has to be used a warning should be issued such that the user can properly set his code to avoid RunTime errors.

@swhite2401
Copy link
Contributor Author

@swhite2401:

  • I do not deactivate anything, but for me $ python starts python 2.7. I have to use $ python3. Why is it different for you ?
  • Now for me, the simple fact of importing at between import multiprocessing and set_start_method makes it crash. Otherwise it does not…

This is a nightmare…

Yes...I don't understand what is going on...

My terminal loads automatically my home pyenv (3.8) -> no error
Then I deactivate it, it switches to the default conda env -> error
I do conda deactivate -> no error

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

@simoneliuzzo: you should remove the try/except in you test: here we do not know if the set_start_method succeeded or failed.

For me, it always succeeds, I never got a failure…

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

@swhite2401: it indeed looks like a configuration problem. GitHub uses a fresh python install and has no problems (though we do not try using 'spawn').

Another test on grappa, still using the default python3 (I do not know anything else):

grappa:~ $ python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing as mp
>>> mp.set_start_method('spawn')
>>> import at
>>> mp.get_start_method()
'spawn'
>>> 

Here I set the method before importing at, and it works (up to there, I did not spawn anything).

But again I strongly discourage deviating from the default: many risks, no benefit ! We already lost a lot of time on that…

@simoneliuzzo
Copy link
Contributor

@simoneliuzzo: you should remove the try/except in you test: here we do not know if the set_start_method succeeded or failed.

For me, it always succeeds, I never got a failure…

Screenshot 2022-01-11 at 19 04 21

@simoneliuzzo
Copy link
Contributor

@simoneliuzzo: Do you confirm that you get no error on MacOS, with both methods ? Can you see a speed difference between both ? And how did you get a crash in the past ?

I do not know if I can reproduce the conditions of the error immediately. If there is a code snippet to test, I am glad to run it!

@lfarv
Copy link
Contributor

lfarv commented Jan 11, 2022

@simoneliuzzo: I never got any error with patpass on MacOS. That's why I am suspicious. But you used it much more than I did, so you are the only one able to provide a failing example. Otherwise, this PR solves the problem of using non-default methods, but does not change the default behaviour.

I would just be interested in a speed comparison of 'fork' and 'spawn' on Mac, on a rather long tracking, so that the overhead of spawning in small. Thanks !

@swhite2401
Copy link
Contributor Author

swhite2401 commented Jan 12, 2022

@simoneliuzzo could you run the following?

import at
import numpy
from at.tracking import patpass
import time
import multiprocessing as mp

mp.set_start_method('spawn')  

path = '../lattice/'
filename = 'S28F.mat'
key = 'LOW_EMIT_RING_INJ'
latticef = path+filename
ring = at.load_lattice(latticef,key=key)
ring.radiation_off()

parts=numpy.zeros((6,200))
t0 = time.time()
patpass(ring,numpy.asfortranarray(parts),nturns=1024)
print(time.time()-t0)

@simoneliuzzo
Copy link
Contributor

Screenshot 2022-01-12 at 08 43 58

@swhite2401
Copy link
Contributor Author

@swhite2401: it indeed looks like a configuration problem. GitHub uses a fresh python install and has no problems (though we do not try using 'spawn').

Another test on grappa, still using the default python3 (I do not know anything else):

grappa:~ $ python3
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import multiprocessing as mp
>>> mp.set_start_method('spawn')
>>> import at
>>> mp.get_start_method()
'spawn'
>>> 

Here I set the method before importing at, and it works (up to there, I did not spawn anything).

But again I strongly discourage deviating from the default: many risks, no benefit ! We already lost a lot of time on that…

For info: from what I have read online (although this is very poorly documented) using fork is unsafe only in multi-threaded programs which is not the case here, so one may claim that fork is safe for AT but this I really cannot guaranty...

Benefits: never fails (unless you run into this unsafe condition), faster.

This is not lost time! We are using this function extensively, it is part of an ongoing development for DA and lifetime calculation and we want it to work for all the machines we are using in all conditions without having to place expert parameters

@swhite2401
Copy link
Contributor Author

Screenshot 2022-01-12 at 08 43 58

Ah thanks! So it works in the end! And is only ~10s slower so it is not to bad in fact

Ok then I don't understand what happened with the DA module...I will clean this version that uses the python default then, we may revive the discussion when I do the PR for the DA module

@swhite2401
Copy link
Contributor Author

@lfarv, @simoneliuzzo this latest version uses the default python as suggested and fixes issues present in the previous version.
The possibility to change the start method was added in case users face difficulties with the default. Help was improved in this respect.

I think this is ready to merge.

@swhite2401 swhite2401 marked this pull request as ready for review January 12, 2022 08:55
@swhite2401 swhite2401 requested a review from lfarv January 12, 2022 08:55
@swhite2401 swhite2401 changed the title force start method to fork check/set multi-processing start method in patpass Jan 12, 2022
pyat/at/tracking/patpass.py Outdated Show resolved Hide resolved
@lfarv
Copy link
Contributor

lfarv commented Jan 12, 2022

At 1st glance, it looks ok. I'll look more carefully this afternoon, give me some time.
What about the strange behaviour on grappa ? All "context" errors disappeared ?

@swhite2401
Copy link
Contributor Author

At 1st glance, it looks ok. I'll look more carefully this afternoon, give me some time. What about the strange behaviour on grappa ? All "context" errors disappeared ?

grappa is a mystery... but it looks like it is related to some configuration issue that I do not understand

@swhite2401
Copy link
Contributor Author

Ah by the way if you feel uneasy about the force keyword, we can use multiprocessing.get_context(start_method) with try/except instead and pass it to _atpass, but I find this implementation simpler

@lfarv
Copy link
Contributor

lfarv commented Jan 12, 2022

I am not happy with this version:

  1. There is a problem with you logic:
    If you call patpass twice with the default parameters:
    on the 1st call, get_start_method returns None, nothing happens, ok.
    on the 2nd call, get_start_method returns 'spawn', so you force set_start_method to None !
  2. Without using contexts, the start method is global, for the whole python run, and normally specified once for ever (the force keyword to modify it is not documented). If in the future we have other functions using parallelism, this global start method will be used in all functions, so it should be specified globally, not at the function level. To keep the choice at the function level, we need a dedicated context for each function. I do not see any reason to use different start methods in different functions, but that's the only way to have a start_method argument in each function.
  3. Using a context makes the logic much simpler: create a context with the input parameter, and that's all, no test, no comparison. It solves point 1.

@swhite2401
Copy link
Contributor Author

I am not happy with this version:

  1. There is a problem with you logic:
    If you call patpass twice with the default parameters:
    on the 1st call, get_start_method returns None, nothing happens, ok.
    on the 2nd call, get_start_method returns 'spawn', so you force set_start_method to None !
  2. Without using contexts, the start method is global, for the whole python run, and normally specified once for ever (the force keyword to modify it is not documented). If in the future we have other functions using parallelism, this global start method will be used in all functions, so it should be specified globally, not at the function level. To keep the choice at the function level, we need a dedicated context for each function. I do not see any reason to use different start methods in different functions, but that's the only way to have a start_method argument in each function.
  3. Using a context makes the logic much simpler: create a context with the input parameter, and that's all, no test, no comparison. It solves point 1.

Ah right, even though set_start_method(None) restore the default so it is not a problem the logic is strange I agree. Anyway I was hesitant with context, if you like it better it can do it easily

@lfarv
Copy link
Contributor

lfarv commented Jan 12, 2022

I think that the globvar keyword is no more necessary. Why using the 'fork' method without taking benefit of global variables…

@swhite2401
Copy link
Contributor Author

I think that the globvar keyword is no more necessary. Why using the 'fork' method without taking benefit of global variables…

Right!

@lfarv
Copy link
Contributor

lfarv commented Jan 12, 2022

set_start_method(None) restore the default

Well, again that's not documented: "method can be 'fork', 'spawn' or 'forkserver'."
Even if it works I feel better sticking to documented behaviour…

@swhite2401
Copy link
Contributor Author

Done. Can you check if this runs fine for you, again I can only check fork...spawn still fails with this runtime error suggesting to use freeze_support()

@lfarv
Copy link
Contributor

lfarv commented Jan 12, 2022

OK with 'spawn' and 'fork'. It works for me ! Ok for merging.

@swhite2401 swhite2401 merged commit db18253 into master Jan 18, 2022
@swhite2401 swhite2401 deleted the atpass_start_method branch January 18, 2022 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants