Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libpmi.so is missing in pcluster 2.3.1 with Slurm 18.08.6 (required by OpenMPI) #1008

Closed
JiaweiZhuang opened this issue Apr 16, 2019 · 1 comment

Comments

@JiaweiZhuang
Copy link

JiaweiZhuang commented Apr 16, 2019

Environment:

  • AWS ParallelCluster 2.3.1
  • OS: ubuntu1604 or centos7
  • Scheduler: Slurm
  • Master instance type: c5n.large
  • Compute instance type: c5n.18xlarge

Bug description and how to reproduce:
In the previous version pcluster 2.2.1 with Slurm 16.05.3, libpmi was installed properly:

ubuntu@ip-172-31-13-1:/opt/slurm/lib$ ls
libpmi.a   libpmi.so    libpmi.so.0.0.0  libslurmdb.a   libslurmdb.so     libslurmdb.so.30.0.0  libslurm.so     libslurm.so.30.0.0
libpmi.la  libpmi.so.0  libslurm.a       libslurmdb.la  libslurmdb.so.30  libslurm.la           libslurm.so.30  slurm

In pcluster 2.3.1 with Slurm 18.08.6, libpmi.so/libpmi.a is missing

ubuntu@ip-172-31-1-198:/opt/slurm/lib$ ls
libslurm.a    libslurmdb.la  libslurmdb.so.33      libslurm.la  libslurm.so.33      slurm
libslurmdb.a  libslurmdb.so  libslurmdb.so.33.0.0  libslurm.so  libslurm.so.33.0.0

(The header file pmi.h is still in /opt/slurm/include/slurm, though.)

OpenMPI relies on Slurm's libpmi for launching processes (https://www.open-mpi.org/faq/?category=slurm). With current pcluster 2.3.1, you would get this installation error:

$ wget https://download.open-mpi.org/release/open-mpi/v3.1/openmpi-3.1.3.tar.gz
$ tar zxf openmpi-3.1.3.tar.gz
$ cd openmpi-3.1.3
$ ./configure prefix=$HOME/openmpi --with-pmi=/opt/slurm --with-slurm
...
checking if user requested PMI support... yes
checking for pmi.h in /opt/slurm/include/slurm... found
checking pmi.h usability... yes
checking pmi.h presence... yes
checking for pmi.h... yes
checking for libpmi in /opt/slurm/include/slurm/lib... checking for libpmi in /opt/slurm/include/slurm/lib64... not found
checking for pmi2.h in /opt/slurm/include/slurm... not found
checking for pmi2.h in /opt/slurm/include/slurm/include... not found
checking for pmi2.h in /opt/slurm/include/slurm/include/slurm... not found
checking for libpmi2 in /opt/slurm/include/slurm/lib... checking for libpmi2 in /opt/slurm/include/slurm/lib64... not found
checking can PMI support be built... no
configure: WARNING: PMI support requested (via --with-pmi) but neither pmi.h
configure: WARNING: nor pmi2.h were found under locations:
configure: WARNING:     /opt/slurm/include/slurm
configure: WARNING:     /opt/slurm/include/slurm/slurm
configure: WARNING: Specified path: /opt/slurm/include/slurm
configure: WARNING: OR neither libpmi nor libpmi2 were found under:
configure: WARNING:     /lib
configure: WARNING:     /lib64
configure: WARNING: Specified path:
configure: error: Aborting

The exact same steps would work with pcluster 2.2.1

Is this a Slurm issue or something wrong with ParallelCluster's cookbook?

PS: I found this issue from a comment under my blog post: http://disq.us/p/216yn69

@lukeseawalker
Copy link
Contributor

Hi @JiaweiZhuang,
good catch, the libpmi is now in a separate package, see https://bugs.schedmd.com/show_bug.cgi?id=4511
I'm looking into.

lukeseawalker added a commit to lukeseawalker/aws-parallelcluster-cookbook that referenced this issue Apr 17, 2019
The libpmi is now in a separate slurm package see https://bugs.schedmd.com/show_bug.cgi?id=4511
so it needs to be installed explicitly

This will solve aws/aws-parallelcluster#1008

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
lukeseawalker added a commit to aws/aws-parallelcluster-cookbook that referenced this issue Apr 19, 2019
The libpmi is now in a separate slurm package see https://bugs.schedmd.com/show_bug.cgi?id=4511
so it needs to be installed explicitly

This will solve aws/aws-parallelcluster#1008

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
lukeseawalker added a commit to lukeseawalker/aws-parallelcluster-cookbook that referenced this issue Apr 29, 2019
The libpmi is now in a separate slurm package see https://bugs.schedmd.com/show_bug.cgi?id=4511
so it needs to be installed explicitly

This will solve aws/aws-parallelcluster#1008

Signed-off-by: Luca Carrogu <carrogu@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants