Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llsubmit error with ASAGI #46

Open
git-taufiq opened this issue Aug 13, 2018 · 4 comments
Open

llsubmit error with ASAGI #46

git-taufiq opened this issue Aug 13, 2018 · 4 comments

Comments

@git-taufiq
Copy link
Contributor

Hi all,

I am trying to use ASAGI to input the stress tensor and dynamic parameter NetCDF in fault.yaml
I got this error messages while llsubmit job.
'''
Abort(1) on node 29 (rank 29 in comm 1140850688): Fatal error in MPI_Allreduce: Message truncated, error stack:
MPI_Allreduce(912).......: MPI_Allreduce(sbuf=0x7ffd066738b8, rbuf=0x7ffd06677710, count=1, MPI_DOUBLE_PRECISION, MPI_MIN, MPI_COMM_WORLD) failed
MPIR_Allreduce_impl(769).:
MPIR_Allreduce_intra(419):
MPIC_Sendrecv(467).......:
MPIDI_Buffer_copy(73)....: Message truncated; 260 bytes received but buffer size is 8
'''
Do you have any idea to fix this problem?

@Thomas-Ulrich
Copy link
Contributor

Hi,
It looks like a bug in asagi/easi. It seems that the case in which all partitions dont have a fault boundary condition has not been properly handled.
In particular, sebastian designed a specific mpi communicator containing only the ranks with fault boundary that could be used instead the communicator containing all ranks.
Thomas.

@daisy20170101
Copy link
Contributor

I have located the error in src/Initializer/time_stepping/MultiRate.hpp line 120.
Seems like when mpi passing a variable the size is not the same, one is 260 bytes but one is 8. But no idea why it happened.

@git-taufiq
Copy link
Contributor Author

It solved by adding this line in your job file:
export SEISSOL_ASAGI_MPI_MODE=OFF

@Thomas-Ulrich
Copy link
Contributor

This solves your problem but the bug in ASAGI is not fixed. Therefore issue reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants