Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling when parallelizing? #142

Closed
mauricemolli opened this issue Aug 22, 2019 · 8 comments
Closed

Scaling when parallelizing? #142

mauricemolli opened this issue Aug 22, 2019 · 8 comments

Comments

@mauricemolli
Copy link

Hi Johannes,

Is there any way of finding out whether PyMultiNest is actually running in parallel?
I think I got it to work with MPI on our (SLURM-controlled) cluster.
I tweaked an example problem a bit such that the majority of the acceptance fractions during the runs are <~ 0.5.

However, when I run (or I think I run) on 2 cores, the runtime is the same as when running on a single core...

For completeness, I paste my problem setup

import pymultinest
import numpy as np
import time
import json

def prior(cube, ndim, nparams):

    for i in range(n_params):
        cube[i] = 50.*cube[i]-25.
    return

def loglike(cube, ndim, nparams):

    loglikelihood = 0.
    for i in range(n_params):
        loglikelihood += -0.5 * ((cube[i])/2.5)**2
    time.sleep(0.02)
    return loglikelihood

n_params = 10

t0 = time.time()
pymultinest.run(loglike, prior, n_params, outputfiles_basename='test', resume = False, verbose = True, importance_nested_sampling = True, evidence_tolerance = 0.5, init_MPI = True)

t0 = time.time() - t0

print('Time', t0) 

I let the loglike function sleep a bit o mimmick a function that actually needs a bit of time.

This is my submit script:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=2
#SBATCH --nodes=1
#SBATCH --time=168:00:00
#SBATCH --job-name=pyMN
#SBATCH --mem-per-cpu=50MB

#SBATCH --output=/shared-storage/molliere/pyMN/out/runout
#SBATCH  --error=/shared-storage/molliere/pyMN/out/runerr

module load mpi

srun --mpi=pmi2 python own_test.py

Now, I am not a SLURM expert, and I do not know if you are, but I am not 100% sure whether I am doing things correctly.

Best,
Paul

@JohannesBuchner
Copy link
Owner

Look how many times in the output the multinest header appears (multinest version, sampling initial live points, etc). If n times, it is not using MPI but running n programs that do not communicate. If you have one header, it is ok.

@mauricemolli
Copy link
Author

Wow, thanks for the swift reply. It appears indeed only once. So that is reassuring. However, since I do not use mpiexec -np 2 or mpirun -n 2, the only location where I specify that I want to run on two cores is in the SLURM —cpus-per-task=2 option. So I am not sure if it just ignores all this, and runs on a single core, or if it runs on two cores, after all.

@JohannesBuchner
Copy link
Owner

Maybe print out the size and ranks with a short python script.

https://mpi4py.readthedocs.io/en/stable/tutorial.html#collective-communication

from mpi4py import MPI

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

@mauricemolli
Copy link
Author

Hi Johannes,

Thanks a lot.

I think something is not right on our cluster.
First of all, they didn't have mpi4py installed (now they have).

Now if I run

from __future__ import print_function
from mpi4py import MPI


comm = MPI.COMM_WORLD

print("Hello! I'm rank %d from %d running in total..." % (comm.rank, comm.size))

comm.Barrier()   # wait for everybody to synchronize _here_

I get

[1566552065.016049] [bachelor-node03:63661:0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, ple
ase check shared memory limits by 'ipcs -l'
[1566552065.020486] [bachelor-node03:63663:0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, ple
ase check shared memory limits by 'ipcs -l'
[1566552065.025746] [bachelor-node03:63662:0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, ple
ase check shared memory limits by 'ipcs -l'
[1566552065.030674] [bachelor-node03:63664:0]            sys.c:618  UCX  ERROR shmget(size=2097152 flags=0xfb0) for mm_recv_desc failed: Operation not permitted, ple
ase check shared memory limits by 'ipcs -l'
Hello! I'm rank 3 from 4 running in total...
Hello! I'm rank 1 from 4 running in total...
Hello! I'm rank 0 from 4 running in total...
Hello! I'm rank 2 from 4 running in total...

So it looks as if it is using MPI, but there are some errors. And it does not work with PyMultiNest, see below.

For completeness, this is my submit script:

#!/bin/bash

#SBATCH -n 4
#SBATCH --time=168:00:00
#SBATCH --job-name=pyMN

#SBATCH --mail-type=ALL
#SBATCH --mem-per-cpu=50MB

#stderr und stdout
#SBATCH --output=/shared-storage/molliere/mpi_test/runout
#SBATCH  --error=/shared-storage/molliere/mpi_test/runerr

module load openmpi-4.0.1

# run the application ; here your programs
mpiexec -n 4 python mpi4py_test.py

If I run PyMultiNest on 4 cores, I see the startup text 3 (!) times, and the above errors (UCX ERROR...).

@JohannesBuchner
Copy link
Owner

Maybe try without the barrier and double-check that multinest was compiled with mpi (libmultinest_mpi.so must exist).

@JohannesBuchner
Copy link
Owner

You could also take one or several of the example programs of the page I linked to, and go with the error messages to your cluster admin. Possibly you need to instruct SLURM a bit better about your resource intentions?
I am also not sure if the SBATCH -n 4 is redundant with the mpiexec call? This page: https://hpc-wiki.info/hpc/SLURM indicates you don't need mpiexec but srun?

@mauricemolli
Copy link
Author

Hi Johannes,

Thanks again for your help.
Two things helped in the end:

  1. libmultinest_mpi.so was not existing, but our IT managed to have cmake find openmpi when building MultiNest in the end.
  2. Issue Running pyMultiNest with MPI #105 helped as well, to find a Slurm submit script that works. I now use
#!/bin/bash
#SBATCH --job-name=pyMultiNestMPI
#SBATCH --output=pyMultiNestMPI-%j.out
#SBATCH --error=pyMultiNestMPI-%j.err
#SBATCH --nodes=1
#SBATCH --ntasks=4
#SBATCH --time=01:00:00
#SBATCH --mem=15G

LD_LIBRARY_PATH=/usr/local2/misc/multinest/MultiNest/lib:$LD_LIBRARY_PATH
LD_LIBRARY_PATH=/usr/local2/misc/ucx/ucx-1.5.2/build/lib:$LD_LIBRARY_PATH
module load openmpi-4.0.1

mpiexec -n 4 python own_test_issue_105.py

The LD_LIBRARY_PATH and module load openmpi-4.0.1 are obviously specific to our system.

Then, running on 1 core gives:
('Time', 365.6345908641815)

Running on 4 cores gives:
('Time', 115.94996094703674)

Running on 10 cores gives:
('Time', 54.59888315200806)

I could not be happier...
Looking forward to sample something exciting next!

Best,
Paul

@gabrielastro
Copy link

gabrielastro commented Feb 9, 2024

Johannes, @mauricemolli, thank you for this very helpful thread! I had similar problems: either N independent tasks ran, or there were messages such as:

>>> from mpi4py import MPI
[node33:3557954] OPAL ERROR: Unreachable in file ext3x_client.c at line 112
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[node33:3557954] Local abort before MPI_INIT completed completed successfully,
but am not able to aggregate error messages, and not able to guarantee
that all other processes were killed!

or just nothing happened, and so on. The solution includes, besides exporting LD_LIBRARY_PATH to where the libraries are (from find /usr -name libmultinest.so or in ~ or /lib or…, and same for libmultinest_mpi):

  • Modifying the test script own_test_issue_105.py above to have …, init_MPI = False)! It was most certainly changed before getting it to run (i.e., True does not work) and the post was simpy not updated 🕵️. Indeed, as noted in the documentation, "init_MPI should be set to False, because importing mpi4py initialises MPI already".

  • Running the script not from within python but through mpirun: e.g., sbatch -n 8 Skript.sh, where Skript.sh contains (LD_LIBRARY_PATH gets set in .bashrc):

#!/usr/bin/bash
mpirun.openmpi python3 own_test_issue_105_MODIFIED.py
  • Using the same MPI version to run as was used when compiled; one cannot mix OpenMPI and MPICH. On our cluster for instance, both mpirun.openmpi and mpirun.mpich are available:
lrwxrwxrwx 1 root 7 27. Mai 2021  /usr/bin/mpirun.openmpi -> orterun
lrwxrwxrwx 1 root 13  1. Feb 2022  /usr/bin/mpirun.mpich -> mpiexec.hydra

Depending on the cluster set-up, it can be necessary to use the full explicit path to mpirun or mpiexec, even if using module load <some version of OpenMPI/MPICH>! Surprisingly, the mpirun/mpiexec executables might not correspond to the MPI libraries with which the MultiNest was compiled.

Note:

  • One can install MultiNest and PyMultiNest as a regular user (i.e., no need to be root).

  • If needed, one can force pymultinest to run serially despite mpi4py being installed by doing:

import sys as sys
sys.modules['mpi4py'] = None

I hope these notes can help someone! Changing init_MPI to False in the test above was the main point. By the way, I included into it before n_params = 10:

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
print(">>>> size %d, rank = %d\n" % (size,rank) )

Gabriel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants