Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenMPI on CentOS-HPC 7.9: unable to use ucx because UCP worker does not support MPI_THREAD_MULTIPLE #169

Closed
ltalirz opened this issue Oct 7, 2022 · 2 comments

Comments

@ltalirz
Copy link
Contributor

ltalirz commented Oct 7, 2022

I am unable to use the UCX messaging layer together with OpenMPI for an application that supports MPI + OpenMP parallelization strategies.

While the UCX pml component is found, initializing the component fails:

mpirun --mca pml ucx --mca pml_base_verbose 20 application
...
[hc44-low-2:20718] mca: base: components_register: registering framework pml components
[hc44-low-2:20718] mca: base: components_register: found loaded component ucx
[hc44-low-2:20718] mca: base: components_register: component ucx register function successful
[hc44-low-2:20718] mca: base: components_open: opening pml components
[hc44-low-2:20718] mca: base: components_open: found loaded component ucx
[hc44-low-2:20718] mca: base: components_open: component ucx open function successful
[hc44-low-2:20718] select: initializing pml component ucx
[hc44-low-2:20718] select: init returned failure for component ucx
...

When I export OMPI_MCA_pml_ucx_verbose=10, I am notified that

[hc44-low-2:21571] pml_ucx.c:325 UCP worker does not support MPI_THREAD_MULTIPLE. PML UCX could not be selected

This happens even when I export OMP_NUM_THREADS=1 (I guess this is independent of whether multiple threads are actually used).

I read in openucx/ucx#5284 (comment) that I may need UCX to be built with the --enable-mt option.

Would it be possible to have the UCX that ships with CentOS-HPC built with the --enable-mt option?

Or is this already the case and I am barking up the wrong tree here?

@jithinjosepkl
Copy link
Contributor

@ltalirz , any reason we cannot use HPC-X (which is OMPI+UCX) here? The mt-init scripts are there in the HPC-X folder. (/opt/hpcx*). The UCX mt build is also here.

@ltalirz
Copy link
Contributor Author

ltalirz commented Oct 27, 2022

Hi @jithinjosepkl , sorry for the long delay on our side.

First, my counter question would be: what speaks against having the OpenMPI in /opt be compiled against UCX with thread support? Would you be open to a PR that fixes this?

Second, we tried using the OpenMPI from HPC-X but ran into a number of issues.

  1. The HPCX module files are actually broken, see HPCX and IMPI modules not compatible with Lmod #80 . PR link hpcx and impi modulefiles iso trying to load from outside module… #102 that tried to fix this was closed by you without follow-up (?)
  2. After fixing this, mpicc and mpifort from the mpi/hpcx module pick up the outdated gcc 4.8.5 from the CentOS image which fails to compile even a simple MPI program (this despite the fact that ompi_info tells that the OpenMPI bundled with HPC-X is built with gcc 9.2.0).
    This can be fixed by explicitly loading module load gcc-9.2.0 mpi/hpcx (we suggest making gcc-9.2.0 a dependency of hpi/hpcx)
  3. Even after this step, we still run into errors when using the hpcx openmpi as an external mpi library in spack, since the libtool .la files shipped with hpcx point to non-existent libraries in /hpc (that were used to compiled hpcx).

Given your comment in #102 we understand that we should switch to AlmaLinux 8.6.
We will open a new issue, should any of these issues persist there.
cc @matt-chan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants