-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI builds and Python multiprocessing #1608
Comments
Actually, cdms2 "tvariable.py" is calling "from mpi4py import MPI" which calls MPI_Init. It happens automatically if mpy4py exist. I guess we could let the user do the call from mpi4py import MPI before calling cdms2 if they want MPI_Init to be called.... |
The problem does not come from tvariable.py. It doesn't import mpi4py if it can't find it, and for testing I had entirely deleted mpi4py. |
Jeff, I found that seting "rc.initialize=false" will disable MPI_Init(). #!/usr/bin/env python
from mpi4py import rc
rc.initialize = False
from mpi4py import MPI
assert not MPI.Is_initialized()
assert not MPI.Is_finalized()
MPI.Init()
assert MPI.Is_initialized()
assert not MPI.Is_finalized()
MPI.Finalize()
assert MPI.Is_initialized()
assert MPI.Is_finalized() |
I tried "rc.initialize=False" but it didn't help - the error message still appears if you try to use multiprocessing. Probably MPI_Init is called some place not known to mpi4py. |
Does this help? from mpi4py import rc
rc.initialize = False
from mpi4py import MPI
MPI.Init()
import cdms2
import multiprocessing
MPI.Finalize() |
This suggestion leads to an error "Calling MPI_Init or MPI_Init_thread twice is erroneous." which strengthens my guess that MPI_Init is called by C code. Calling MPI.Finalize() was a good idea, but it doesn't work yet. If I take out the MPI.Init() call, then I get an error message "The MPI_Comm_rank() function was called after MPI_FINALIZE was invoked." There's nothing in my code which asks for the rank; comm is set to None. So something deep down in the system still thinks that MPI is running. |
I changed the milestone to 2.4.1. Version 2.4.1. is supposed to have the high-performance climatology script. On Rhea, this script must use the multiprocessing module. So we have to either (1) drop MPI support entirely, or (2) maintain two versions of UV-CDAT, or (3) fix the problem described above. |
I vote for option 3, but my CDMS work only starts in March. January and February are entirely devoted to CMOR and CMIP6. ( I am making good progress on code, but need doc/paper.) |
Is anybody actually using MPI support in UV-CDAT? The projects I know of (including my own) are not ready for release. If it's nobody, then the quick & easy solution would be to combine my solutions (1) and (3): release 2.4.1 with MPI completely disabled, then put it back in, with the fix, for 2.6. |
Nobody is using in, if you enable it with CDMS flags then netCDF is using MPI I/O. I just advertized mip4py and parallel I/O at the AMS meeting and got a lot of interest. I vote to disable it for release 2.4.1 and put it back for 2.6. Jim @mcenerney1 is using mpi4py for metrics. |
Jim's not ready for release yet. There's working MPI in climatology.py, but I plan to release it with MPI disabled - on Rhea it's faster to fork multiple processes on a single processor. If we need MPI for best performance on aims4 (I've not done a timing study there), then we may have to support two builds for a little while, my solution (2). |
@painter1 a quick solution is to check the import and return a nice clean error message if mpi is not present. By default UVCDAT is not built with MPI anyway. |
@dnadeau4 didn't you push a fix for this before anyway? |
We did and Jeff is aware of it. MPI is not working well with multiprocessing python module. |
No, we haven't fixed the problem yet. What we've settled on is that we can get by without fixing it for another month or two. |
So far as I know, nobody has even looked at this problem yet. |
Excuse me, did anyone solve the problem that how to make MPI works well with multiprocessing python module? |
It should be our goal for UV-CDAT to allow the user to use any standard Python module. It is a bug if you have to use one build of UV-CDAT for one module, and another build for another module.
I want to use the multiprocessing module. This is part of the Python standard, so it is available in every build of UV-CDAT. Sometimes, but not at the same time, I want to use the py4mpi module. This isn't part of the Python standard, but is widely used, and can optionally be built into UV-CDAT
The problem is that if we build a UV-CDAT with MPI support, then MPI is always running whether you need it or not. This is not a problem on a Macintosh. But on some platforms, OpenMPI (the version we build into UV-CDAT) forbids fork operations, In Python 2.7 and below, the multiprocessing module creates processes by forking them (Python 3.x allows other, slower, methods). So it won't work!
On Rhea.ccs.ornl.gov, I built a UV-CDAT with MPI support, made sure that no Python code imports mpi4py (by deleting it from site-packages/), and then tried to run a script with the multiprocessing module. I got a warning:
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
and when I ran anyway, my code failed sometimes, always during file open operations.
Almost surely, the explanation is that some C code someplace in UV-CDAT is calling MPI_Init. That's necessary if you are using py4mpi, but deadly (on Rhea) if you are using the multiprocessing module.
If the user never imports py4mpi, then we should not call MPI_Init. If the user imports both py4mpi and multiprocessing, then we should call MPI_Init, and issue a warning.
The text was updated successfully, but these errors were encountered: