-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature optimise across nodes #542
Conversation
That would be a major limitation. But simply telling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start to me. There is obviously room for improving efficiency by going for dynamic scheduling, but that can be extended later.
Please extend the documentation on how to use it (as posted in the PR description).
Do you already have any benchmarking / scaling results? Would be interesting to see.
pypesto/engine/mpi_pool.py
Outdated
logger.info(f"Performing parallel task execution on {n_procs} " | ||
f"nodes with chunksize of {self.chunksize}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Communicator size is not necessarily the number of nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i do think it is, at least when i tested, if i allocated e.g. 5 nodes in my slurm file, the communicator size was also 5. Not sure whether this is exactly want you meant though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how you invoke mpiexec. As mentioned, I don't see any obvious problem with having one MPI rank per core instead of per node. The slightly more expensive communication should be overcompensated by not wasting one full node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be because i invoked mpiexec with -n [Number]
that it always only ran one process per node.
Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
Had the idea as well, tested some things that did not work, but will try a bit more. |
Codecov Report
@@ Coverage Diff @@
## develop #542 +/- ##
===========================================
+ Coverage 88.16% 88.56% +0.40%
===========================================
Files 79 87 +8
Lines 5257 5380 +123
===========================================
+ Hits 4635 4765 +130
+ Misses 622 615 -7
Continue to review full report at Codecov.
|
I updated the I think it would be good to have a description somewhere on how to use it exactly, where can i put something like that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but would be great to include a test case (which would also serve as usage example).
Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
Can i write a test which requires a slurm file and a cluster and would still pass any pytest test? That was the main issue i had. |
It shouldn't require slurm, just mpiexec should do the job, not? I see that it's a bit inconvenient. Easiest might be creating a test script that runs a (cheap - Rosenbrock?) optimization using the MPIPoolEngine and writing the result to a file. From a pytest test case you could then launch |
ok, so the test seems to work. @dweindl i noticed that if i run the slurm file in Bonna, it does not stop altough the program has finished. |
Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
test/optimize/test_optimize.py
Outdated
os.system('rm temp_result1.h5') | ||
os.system('rm temp_result2.h5') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use os.remove() or os.unlink() for corresponding functions from Pathlib
Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Co-authored-by: Daniel Weindl <dweindl@users.noreply.github.com>
Hey,
i implemented an engine, that can optimise across nodes. Tested it also for a small model (Benchmarkcollection: Zhao) and got a definite speedup. The exact speedup most likely depends on the complexity of the model.
I also wanted to add the engine to the test_engine.py file, but one cannot test it without using multiple nodes so that probably would not make too much sense? Any suggestions on that?
Lastly, some things to note:
In Bonna, srun and MPI do not work together and send out an error. This already happens as soon as one imports mpi4py so for the convenience, i did not add the MPIPoolEngine to the
__init__.py
file. Thus when using it one has to import it separately throughfrom pypesto.engine.mpi_pool import MPIPoolEngine
.Since srun does not work, one can use in the slurm file
mpiexec -n [number of nodes] python -m mpi4py.futures [filename]
. Additionally when using it, one has to clarify that all the work before the optimisation should be done on only one node. For this one can put the whole code as it is in aif __name__ == '__main__':
-condition.Currently the optimisation works in the following way: one node is considered the master while the other are considered the workers. The master sends the workers batches (of the size of #cpus/node) of tasks, which they in turn execute with a multiprocessing Pool. Thus currently a whole node is only coordinating and not working. That would be a next step and i would be happy for any suggestions.