-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling when parallelizing? #142
Comments
Look how many times in the output the multinest header appears (multinest version, sampling initial live points, etc). If n times, it is not using MPI but running n programs that do not communicate. If you have one header, it is ok. |
Wow, thanks for the swift reply. It appears indeed only once. So that is reassuring. However, since I do not use mpiexec -np 2 or mpirun -n 2, the only location where I specify that I want to run on two cores is in the SLURM —cpus-per-task=2 option. So I am not sure if it just ignores all this, and runs on a single core, or if it runs on two cores, after all. |
Maybe print out the size and ranks with a short python script. https://mpi4py.readthedocs.io/en/stable/tutorial.html#collective-communication
|
Hi Johannes, Thanks a lot. I think something is not right on our cluster. Now if I run
I get
So it looks as if it is using MPI, but there are some errors. And it does not work with PyMultiNest, see below. For completeness, this is my submit script:
If I run PyMultiNest on 4 cores, I see the startup text 3 (!) times, and the above errors (UCX ERROR...). |
Maybe try without the barrier and double-check that multinest was compiled with mpi (libmultinest_mpi.so must exist). |
You could also take one or several of the example programs of the page I linked to, and go with the error messages to your cluster admin. Possibly you need to instruct SLURM a bit better about your resource intentions? |
Hi Johannes, Thanks again for your help.
The LD_LIBRARY_PATH and module load openmpi-4.0.1 are obviously specific to our system. Then, running on 1 core gives: Running on 4 cores gives: Running on 10 cores gives: I could not be happier... Best, |
Johannes, @mauricemolli, thank you for this very helpful thread! I had similar problems: either N independent tasks ran, or there were messages such as:
or just nothing happened, and so on. The solution includes, besides
Depending on the cluster set-up, it can be necessary to use the full explicit path to Note:
I hope these notes can help someone! Changing
Gabriel |
Hi Johannes,
Is there any way of finding out whether PyMultiNest is actually running in parallel?
I think I got it to work with MPI on our (SLURM-controlled) cluster.
I tweaked an example problem a bit such that the majority of the acceptance fractions during the runs are <~ 0.5.
However, when I run (or I think I run) on 2 cores, the runtime is the same as when running on a single core...
For completeness, I paste my problem setup
I let the loglike function sleep a bit o mimmick a function that actually needs a bit of time.
This is my submit script:
Now, I am not a SLURM expert, and I do not know if you are, but I am not 100% sure whether I am doing things correctly.
Best,
Paul
The text was updated successfully, but these errors were encountered: