Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible data corruption for Fortran in the presence of MPI_IN_PLACE #46

Open
jgraciahlrs opened this issue May 10, 2023 · 3 comments
Open

Comments

@jgraciahlrs
Copy link

A range of MPI operations allow to reuse send buffers as receive buffers by setting the send buffer to a special constant MPI_IN_PLACE. With Fortran applications this can lead to data corruption if executed with mpiP.

The corruption can be demonstrated with this simple code:

PROGRAM sample_allreduce
  USE mpi
  IMPLICIT NONE

  INTEGER :: ierr
  INTEGER :: rank, rank_in_place
  INTEGER :: rank_sum

  CALL MPI_Init(ierr)
  CALL MPI_Comm_rank(MPI_COMM_WORLD, rank, ierr)

  rank_in_place = rank

  PRINT *, 'Rank: ', rank
  CALL MPI_Allreduce(rank, rank_sum, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)                           
  CALL MPI_Allreduce(MPI_IN_PLACE, rank_in_place, 1, MPI_INT, MPI_SUM, MPI_COMM_WORLD, ierr)              
  PRINT *, 'Sum: ', rank_sum, ' - ', rank_in_place

  CALL MPI_Finalize(ierr)
END PROGRAM sample_allreduce

Executing without mpiP instrumentation leads to the expected output

$ mpirun -np 3 ./a.out                                                           
 Rank:            0
 Sum:            3  -            3
 Rank:            1
 Sum:            3  -            3
 Rank:            2
 Sum:            3  -            3

while executing with mpiP corrupts the data as

$ mpirun -np 3 env LD_PRELOAD=$HLRS_MPIP_ROOT/lib/libmpiP.so ./a.out
mpiP: 
mpiP: mpiP V3.5.0 (Build Mar 16 2023/14:16:24)
mpiP: 
 Rank:            0
 Sum:            3  -            0
 Rank:            1
 Sum:            3  -            0
 Rank:            2
 Sum:            3  -            0
mpiP: 
mpiP: Storing mpiP output in [./a.out.3.1905013.1.mpiP].
mpiP: 

Note, that the second column (which used MPI_IN_PLACE) is "0" while it should be "3".

I guess that the underlying problem is missing or incorrect treatment of constants such as MPI_IN_PLACE in the transition from Fortran to C PMPI interfaces. A similar problem has been observed in other projects / tools using PMPI such as here. In fact, the code above is taken from that issue.

I have observed this behavior for mpiP v3.4.1 and v3.5 using GCC v10.2 with either OpenMPI v4.1.4 or HPE's MPI implementation MPT 2.26.

Also note, that the code runs correctly when replacing use mpi with use mpi_f08, at least for OpenMPI (but not for MPT).

@jgraciahlrs
Copy link
Author

Please ignore my last sentence above. use mpi_f08 with OpenMPI does not invoke mpiP at all ... and thus no data corruption. I will investigate this as well, maybe just a dumb mistake in the hurry.

@naromero77
Copy link

naromero77 commented Feb 1, 2024

It doesn't look like this has been fixed in any official release of mpiP.

@naromero77
Copy link

I can confirm that this is still an issue with both OpenMPI and MPICH.

@cchambreau Are there any pointers that you can provide that may help come up with some resolution or workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants