New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI builds and Python multiprocessing #1608

Open
painter1 opened this Issue Oct 13, 2015 · 20 comments

Comments

Projects
None yet
4 participants
@painter1
Contributor

painter1 commented Oct 13, 2015

It should be our goal for UV-CDAT to allow the user to use any standard Python module. It is a bug if you have to use one build of UV-CDAT for one module, and another build for another module.

I want to use the multiprocessing module. This is part of the Python standard, so it is available in every build of UV-CDAT. Sometimes, but not at the same time, I want to use the py4mpi module. This isn't part of the Python standard, but is widely used, and can optionally be built into UV-CDAT

The problem is that if we build a UV-CDAT with MPI support, then MPI is always running whether you need it or not. This is not a problem on a Macintosh. But on some platforms, OpenMPI (the version we build into UV-CDAT) forbids fork operations, In Python 2.7 and below, the multiprocessing module creates processes by forking them (Python 3.x allows other, slower, methods). So it won't work!

On Rhea.ccs.ornl.gov, I built a UV-CDAT with MPI support, made sure that no Python code imports mpi4py (by deleting it from site-packages/), and then tried to run a script with the multiprocessing module. I got a warning:
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.
and when I ran anyway, my code failed sometimes, always during file open operations.

Almost surely, the explanation is that some C code someplace in UV-CDAT is calling MPI_Init. That's necessary if you are using py4mpi, but deadly (on Rhea) if you are using the multiprocessing module.

If the user never imports py4mpi, then we should not call MPI_Init. If the user imports both py4mpi and multiprocessing, then we should call MPI_Init, and issue a warning.

@painter1 painter1 added the Bug label Oct 13, 2015

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 13, 2015

Actually, cdms2 "tvariable.py" is calling "from mpi4py import MPI" which calls MPI_Init. It happens automatically if mpy4py exist. I guess we could let the user do the call from mpi4py import MPI before calling cdms2 if they want MPI_Init to be called....

@painter1

This comment has been minimized.

Contributor

painter1 commented Oct 13, 2015

The problem does not come from tvariable.py. It doesn't import mpi4py if it can't find it, and for testing I had entirely deleted mpi4py.

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 13, 2015

Jeff,

I found that seting "rc.initialize=false" will disable MPI_Init().

#!/usr/bin/env python

from mpi4py import rc
rc.initialize = False

from mpi4py import MPI
assert not  MPI.Is_initialized()
assert not MPI.Is_finalized()

MPI.Init()
assert MPI.Is_initialized()
assert not MPI.Is_finalized()

MPI.Finalize()
assert MPI.Is_initialized()
assert MPI.Is_finalized()
@dnadeau4

This comment has been minimized.

@painter1

This comment has been minimized.

Contributor

painter1 commented Oct 13, 2015

I tried "rc.initialize=False" but it didn't help - the error message still appears if you try to use multiprocessing. Probably MPI_Init is called some place not known to mpi4py.

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Oct 13, 2015

Does this help?

from mpi4py import rc
rc.initialize = False

from mpi4py import MPI
MPI.Init()

import cdms2
import multiprocessing

MPI.Finalize()
@painter1

This comment has been minimized.

Contributor

painter1 commented Oct 14, 2015

This suggestion leads to an error "Calling MPI_Init or MPI_Init_thread twice is erroneous." which strengthens my guess that MPI_Init is called by C code.

Calling MPI.Finalize() was a good idea, but it doesn't work yet. If I take out the MPI.Init() call, then I get an error message "The MPI_Comm_rank() function was called after MPI_FINALIZE was invoked." There's nothing in my code which asks for the rank; comm is set to None. So something deep down in the system still thinks that MPI is running.

@doutriaux1 doutriaux1 added this to the 3.0 milestone Oct 14, 2015

@painter1 painter1 modified the milestones: 2.4.1, 3.0 Jan 15, 2016

@painter1

This comment has been minimized.

Contributor

painter1 commented Jan 15, 2016

I changed the milestone to 2.4.1. Version 2.4.1. is supposed to have the high-performance climatology script. On Rhea, this script must use the multiprocessing module. So we have to either (1) drop MPI support entirely, or (2) maintain two versions of UV-CDAT, or (3) fix the problem described above.

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Jan 15, 2016

I vote for option 3, but my CDMS work only starts in March. January and February are entirely devoted to CMOR and CMIP6. ( I am making good progress on code, but need doc/paper.)

@painter1

This comment has been minimized.

Contributor

painter1 commented Jan 15, 2016

Is anybody actually using MPI support in UV-CDAT? The projects I know of (including my own) are not ready for release. If it's nobody, then the quick & easy solution would be to combine my solutions (1) and (3): release 2.4.1 with MPI completely disabled, then put it back in, with the fix, for 2.6.

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Jan 15, 2016

Nobody is using in, if you enable it with CDMS flags then netCDF is using MPI I/O. I just advertized mip4py and parallel I/O at the AMS meeting and got a lot of interest. I vote to disable it for release 2.4.1 and put it back for 2.6. Jim @mcenerney1 is using mpi4py for metrics.

@painter1

This comment has been minimized.

Contributor

painter1 commented Jan 15, 2016

Jim's not ready for release yet. There's working MPI in climatology.py, but I plan to release it with MPI disabled - on Rhea it's faster to fork multiple processes on a single processor. If we need MPI for best performance on aims4 (I've not done a timing study there), then we may have to support two builds for a little while, my solution (2).

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jan 19, 2016

@painter1 a quick solution is to check the import and return a nice clean error message if mpi is not present. By default UVCDAT is not built with MPI anyway.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jan 19, 2016

@dnadeau4 didn't you push a fix for this before anyway?

@dnadeau4

This comment has been minimized.

Contributor

dnadeau4 commented Jan 20, 2016

We did and Jeff is aware of it. MPI is not working well with multiprocessing python module.

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Jan 21, 2016

@painter1 @dnadeau4 can we close then?

@painter1

This comment has been minimized.

Contributor

painter1 commented Jan 21, 2016

No, we haven't fixed the problem yet. What we've settled on is that we can get by without fixing it for another month or two.

@doutriaux1 doutriaux1 closed this Apr 19, 2016

@doutriaux1

This comment has been minimized.

Member

doutriaux1 commented Apr 19, 2016

@painter1 @dnadeau4 it was fixed somehow right?

@painter1

This comment has been minimized.

Contributor

painter1 commented Apr 19, 2016

So far as I know, nobody has even looked at this problem yet.

@painter1 painter1 reopened this Apr 19, 2016

@doutriaux1 doutriaux1 modified the milestones: 2.6, 2.4.1 Apr 19, 2016

@doutriaux1 doutriaux1 modified the milestones: 3.0, 2.6 May 25, 2016

@hujie-frank

This comment has been minimized.

hujie-frank commented Dec 6, 2017

Excuse me, did anyone solve the problem that how to make MPI works well with multiprocessing python module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment