-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel MPI failure with EFA enabled #1988
Comments
Hi William, looking into it. |
To add, I can run the full IMPI benchmark suite on the master node successfully if I do not specify
The previous working configuration created by @JiaweiZhuang used ParallelCluster 2.4.1, so I'm not sure if something changed in ParallelCluster or Intel MPI in the time since that requires my environment to be setup differently. |
I believe I've narrowed down the issue: it turns out I cannot complete the full IMPI benchmark suite on the compute node with |
I apologize for the delayed response here and thank you for the update and detailed logs. We've been able to reproduce the memory registration error you are seeing with EFA and are investigating a fix for this. That's correct that the error you're seeing with GEOS-Chem and the IMB-EXT Window test is the same bug. |
I've been corresponding with the Libfabric community here and we've asked the Intel MPI team to look into this behavior. Intel suggested turning off direct RMA operations using the environment variable |
I can confirm this works and I can now run the model to completion (though now I'm also running into |
Thank you for testing that and confirming. We'll continue work on a fix for the issue. |
The libfabric community had a discussion on the topic of zero length memory registration, and the conclusion is that zero byte memory registration is an invalid behavior, the documentation has been updated accordingly: so in this case intel MPI is not following libfabric standard. |
I'm going to resolve this issue since there isn't anything we can do on the ParallelCluster side other than upgrading to the latest IntelMPI library once Intel aligns with the libfabric standard. We have relayed this info to Intel which is aware of this. |
Environment:
Bug description and how to reproduce:
I'm attempting to run the GEOS-Chem High Performance model on a ParallelCluster setup. This has successfully been done in the past with older versions of GCHP and ParallelCluster (see work by @JiaweiZhuang as documented here). I'm using a similar setup to that described in the previous link, except I am using GNU compilers rather than Intel ones and newer versions of Intel MPI and ParallelCluster. I have been unable to complete a Slurm-submitted run of the model using Intel MPI and EFA on ParallelCluster 2.8.0, generating the following error:
This is true for both the pre-installed version of Intel MPI available in ParallelCluster 2.8.0 (2019 Update 7) and a Spack-built version of 2019 Update 8. The model runs successfully when run using
mpirun
on only the Master node, but any submission through Slurm (either usingsrun
or a bash script containingsrun
ormpirun
) fails instantly. This failure only occurs when using EFA as the fabric provider; settingFI_Provider
toTCP
or something similar results in a successful but extremely slow run (several times slower than running locally on the Master node).Additional context:
Runs using other MPI implementations (have tested OpenMPI and MVAPICH2) complete successfully but extremely slowly vs. local Master node runs when submitted through Slurm across 1 or multiple nodes. I am not sure if the poor performance of the model when using these implementations is due to lack of EC2-specific optimization as described in #1436 or if I am simply missing some necessary configuration steps. Assuming the former, I would like to get Intel MPI functional (as well as avoid using Intel compilers due to licensing requirements) for using the model on EC2. I will note that the model performs roughly equally (and actually runs successfully on multiple nodes) across different MPI implementations, including Intel MPI, on my institution's local computing cluster.
I've pasted my environment setup and other settings for the cluster above (I've tried most configurations of the commented-out options). I've attached my Slurm job submission script and a lengthy output file containing
I_MPI_DEBUG
andFI_LOG_LEVEL
output as well. Let me know if you would like any additional clarification / would like me to run any more specific tests.slurm_script.txt
run_output.txt
The text was updated successfully, but these errors were encountered: