Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash with Intel MPI 2019.1 #1158

Closed
prckent opened this issue Nov 11, 2018 · 5 comments
Closed

Crash with Intel MPI 2019.1 #1158

prckent opened this issue Nov 11, 2018 · 5 comments
Labels

Comments

@prckent
Copy link
Contributor

prckent commented Nov 11, 2018

It is not clear if this is a problem with Intel MPI, support libraries, or QMCPACK, but on oxygen there is a crash on startup that did not occur with the .0 release:

$ export I_MPI_FABRICS="ofi"
$ yum list libfabric
Installed Packages
libfabric.x86_64                                                  1.6.1-2.el7                                                     @rhel-x86_64-workstation-7
$ mpirun -n 1 ../../../build_intel2019/bin/qmcpack simple-LiH.xml
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

Most of the Intel tests on https://cdash.qmcpack.org/CDash/viewProjects.php will fail until this is fixed or we decide to downgrade the software.

@prckent prckent added the bug label Nov 11, 2018
@ye-luo
Copy link
Contributor

ye-luo commented Nov 11, 2018

Was export I_MPI_FABRICS="ofi" used with old Intel compiler installation on oxygen?
Could you try "export I_MPI_FABRICS=shm"?
Enabling I_MPI_DEBUG may give us some insight.

@prckent
Copy link
Contributor Author

prckent commented Nov 11, 2018

We used to never set this. i.e. Behavior changed between 2019.0 and .1

"shm" is no longer supported. "shm:ofi" also fails. This setting might be a decoy but I explored it due to reported problems on the Intel support site.

@prckent
Copy link
Contributor Author

prckent commented Nov 11, 2018

$ export I_MPI_FABRICS=shm:ofi
$ export I_MPI_DEBUG=1
$ ../../../build_intel2019/bin/qmcpack
[0] MPI startup(): Imported environment partly inaccesible. Map=0 Info=0
[0] MPI startup(): libfabric version: 1.7.0a1-impi
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(639)......:
MPID_Init(860).............:
MPIDI_NM_mpi_init_hook(689): OFI addrinfo() failed (ofi_init.h:689:MPIDI_NM_mpi_init_hook:No data available)

Interesting inconsistency in fabrics version.

@prckent
Copy link
Contributor Author

prckent commented Nov 13, 2018

MPI hello world fails. Happily this is therefore not a QMCPACK problem.

https://software.intel.com/en-us/node/799716?page=0

@prckent
Copy link
Contributor Author

prckent commented Nov 19, 2018

Fixed via https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/799716

Added export FI_PROVIDER=tcp to the oxygen scripts to bypass this bug in Intel MPI 2019.1

@prckent prckent closed this as completed Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants