-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nonblocking communication with boost::mpi::any_source not working for serialized communication #63
Comments
The problem is indeed that the count and data messages get mixed up. There is no trivial fix for this problem in conjunction with boost::mpi::any_source. In my opinion there are two options: Use a different tag value for serialized isend and serialized irecv for the second message (a 1:1 mapping from the user-supplied tag value, which can be used for the first message) or forbid serialized irecv from boost::mpi::any_source completely. Using a different tag value: Applications are allowed to use tag values in the range 0 .. MPI_TAG_UB. Using anything else would be highly dependant on the implementation of the used mpi library. |
Maybe there is another possibility (need to investigate though): the status of the size message seems to provide the actual source. If true, maybe we can force the data request to wait for the same ? |
Which is what we are already doing...
|
Ok so now I get it, sorry. |
There might be another solution. Maybe we can ask the user a second, usually optional, tag to the user and make it mandatory when using any_source. As fr now, the check would be at run time unless we turn it into a tag (struct any_source_t{} any_source;) so that the compiler can force a second tag. The use would still be able to use then value MPI_ANY_SOURCE directly but then we could warn. |
We need to merge #66 before we proceed with this issue as it will impact that specific code. |
Asking for a second tag on serialized non-blocking communication indeed seems to be a good possibility given the options. |
I just discussed this issue with a colleague and we came to the conclusion that it will still not work, even with two user specified tags. Imagine the following: Rank 0 does two successive (patched to take two user tags) isends to rank 1 with tag values C and D (which the new implementation would use for count and data messages, respectively). Rank 1 issues two irecvs with matching tags C and D and stores the requests in r1 and r2. Now r1 matches the count message of the first isend and r2 matches the count message of the second isend because MPI guarantees message ordering. But if the user then for some reason called |
Correct, this need more thinking. Unfortunately I won't be able to work on it for the coming 10 days. We could (using a any_source type tag as proposed earlier) prohibit at compile time its usage on non atomic messages. But that would prevent people from using a feature that could still work with some reasonable precautions. |
Another option that we discussed long ago was to use another communicator for the contents of the nonblocking message.
Matthias
… On Aug 6, 2018, at 17:13, Steffen Hirschmann ***@***.***> wrote:
Asking for a second tag on serialized non-blocking communication indeed seems to be a good possibility given the options.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I don't know what has been discussed in the past. Do you have a reference to prior discussions about this or related problems? Anyway, let me quickly note some thoughts on using different communicators. Using just one pre-dupped communicator (e.g. a second MPI_Comm as part of every boost::mpi::communicator) that is responsible for all data messages does not solve the problem described above (calling Using a new communicator for every single nonblocking point-to-point communication should work, however is impossible to do on-the-fly because dup, split, etc. on communicators are collective operations. It would require predefining n^2 point-to-point communicators, wouldn't it? |
Maybe it has been part of the past discussion, but what about implementing point-to-point serialized data without size mesage ? using |
I think this would solve both issues: not being able to tell a) which messages are data counts and which messages carry the actual data and b) which count message belongs to which data message (on same source, tag pair). Simply because there will be no count messages anymore. However, doing it only for serialized data will probably not work, since there are other kinds of data sent as count + message (for example in the std::vector overloads). These would also need to use a communication scheme with only one message being sent. |
Well, we can match serialized send with serialized receive, But dealing with vector is probably not a big issue anyway. I'm opening a new issue, as this seems more general. |
You can follow #70 if interested in implementation. |
So, a probe version is mostly working, but is limited by what is probably a Intel MPI bug. The following MPI only code works up to 15 process on my installation, but fails to find incoming messages starting with 16 processes. It works on my Open MPI installation. Could you try on your available platform ? Thanks.
My installations are:
|
Worked for me on MBP running Sierra with OpenMPI 3.1.1 and random numbers of cores up to 147:
mpiexec --map-by socket:OVERSUBSCRIBE -n 147 a.out
No hangs or other weirdness so probably okay.
Noel
On Aug 24, 2018, at 5:14 PM, Alain Miniussi <notifications@github.com<mailto:notifications@github.com>> wrote:
#include <stdio.h>
#include <mpi.h>
int main(int argc, char* argv[])
{
MPI_Init(&argc, &argv);
int rank, nproc;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &nproc);
int value = 42;
int input;
int next = (rank + 1) % nproc;
int prev = (rank + nproc - 1) % nproc;
int tag = 2;
MPI_Request sreq;
MPI_Isend(&value, 1, MPI_INT, next, tag, MPI_COMM_WORLD, &sreq);
int probe = 0;
int test = 0;
MPI_Message msg;
do {
if (!test) {
MPI_Test(&sreq, &test, MPI_STATUS_IGNORE);
if (test) {
printf("Proc %i sent msg %i to Proc %i\n", rank, tag, next);
} else {
printf("Proc %i have not sent msg %i to Proc %i yet\n", rank, tag, next);
}
}
if (!probe) {
int err = MPI_Improbe(prev, tag,
MPI_COMM_WORLD, &probe,
&msg,
MPI_STATUS_IGNORE);
if (probe)
printf("Proc %i got msg %i from proc %i\n", rank, tag, prev);
else
printf("Proc %i haven't got msg %i from proc %i yet\n", rank, tag, prev);
}
} while(probe == 0 || test == 0);
MPI_Finalize();
return 0;
}
|
Works for me on OpenMPI-3.1.1 and some older versions. Also, the code seems okay. |
fails with (on a different Linux cluster occigen.cines.fr):
Fails, on the same cluster, (licallo.oca.eu) with
|
Could you test the code with an MPI_Mrecv before finalizing. I know it sounds paranoid but I think the program could be ill formed. The standard says in §8.5 (MPI 3.1, p. 357, l. 34ff, MPI_Finalize): "[Before calling MPI_Finalize a process] must locally complete all MPI operations that it initiated and must execute matching calls needed to complete MPI communications initiated by other processes. For example, [...] if the process is the target of a send, then it must post the matching receive; [...]". It could be that Intel MPI relies on this in the implementations you tested? |
Thanks for the remark, the code would indeed be non conformant as such. |
Please note that I fixed the result comment, it always fails with intel (I got my SLURMA parameters wrong). |
So, it is an issue with Intel'MPI implementation that should be fixed in the soon available 2019 version. |
@hirschsn is it ok for you to provide code under the boost software license ? I'd like to integrate your test case. |
This issue seems fixed in #70. |
Yes, I'm okay with that.
|
Posting more than two irecv with boost::mpi::any_source results in message truncation errors (boost 1.67.0, g++ 7.3, OpenMPI 2.1.1 and newer).
Code:
Symptoms:
If one uses the commented out code instead of pushing back the request, i.e. directly waiting for a request instead of deferring the wait, the code works. Also, again, for PODs the code is working properly.
The symptoms could be explained easily if count and data messages use the same (user provided) tags. Then, count and data messages can get mixed up because of any_source (one irecv receives both counts and one receives both data messages). But this is only speculation. Again, I can do some investigation as soon as I find the time to do it.
The text was updated successfully, but these errors were encountered: