New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add patch for OpenMPI 4.1.1 to support building using --with-cuda=internal #15528
add patch for OpenMPI 4.1.1 to support building using --with-cuda=internal #15528
Conversation
Allow building Open MPI with --with-cuda=internal, by providing an internal minimal cuda.h header file. This eliminate the CUDA (build)dependency; as long as the runtime CUDA version is 8.0+, libcuda.so will be dlopen'ed and used successfully.
I can also add a speedup patch, on Tuesday at the latest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Should we wait for the performance patch or do that in a separate PR?
The overhead was there even if no GPU is available.
That's a seriously large patch (although simple to follow), have you suggested it upstream too? |
@akesandgren open-mpi/ompi#10364 (i assume the plan is to update the PR with the latest cleanup'ed patch?) |
Test report by @Micket |
@akesandgren the upstream patch is only item 1. I wanted some feedback first before committing the rest. I can of course make this patch smaller (or split in 3) but it's a trade off. |
Test report by @branfosj |
@boegelbot please test @ jsc-zen2 |
@SebastianAchilles: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster PR test command '
Test results coming soon (I hope)... - notification for comment with ID 1137859624 processed Message to humans: this is just bookkeeping information for me, |
Test report by @boegelbot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I ran through all expected OSU tests;
#!/usr/bin/env bash
#SBATCH -n 2 -N 2
#SBATCH --gpus-per-node=A40:4
#SBATCH -t 1:00:00
ml OSU-Micro-Benchmarks/5.7.1-gompi-2021a-CUDA-11.3.1
for t in osu_bibw osu_bw osu_latency osu_mbw_mr osu_multi_lat osu_allgather osu_allgatherv osu_allreduce osu_alltoall osu_alltoallv osu_bcast osu_gather osu_gatherv osu_reduce osu_reduce_scatter osu_scatter osu_scatterv osu_iallgather osu_ialltoall osu_ibcast osu_igather osu_iscatter osu_alltoall osu_allreduce osu_reduce osu_alltoall
do
echo "Running ${t}"
mpirun ${t} -d cuda D D
mpirun ${t} -d cuda H D
mpirun ${t} -d cuda D H
mpirun ${t} -d cuda H H
done
and it all worked without errors.
Anyone else wants to have a second check? I'm good with this getting merged.
I tested with the OSU 5.9 (#15343 and #15344). In addition to the above, this also allows running the NCCL based tests: for t in osu_nccl_bibw osu_nccl_bw osu_nccl_latency osu_nccl_allgather osu_nccl_allreduce osu_nccl_bcast osu_nccl_reduce osu_nccl_reduce_scatter osu_nccl_reduce osu_nccl_allreduce
do
echo ${t}
mpirun -np 2 ${t} -d cuda D D
done These were all fine. So LGTM :) |
Going in, thanks @bartoldeman! |
I was running a benchmark on a CPU node with I think I'll just set |
This CPU node had the cuda runtime installed? Otherwise i don't think the smcuda would have even been enabled at all. |
Not that I can tell - this was on benchmarking system provided by a vendor, running Ubuntu, but the only thing nvidia related I can find is
|
If this should be discussed further, let's open an issue on this, discussing stuff in a merged PR is difficult to keep track of... |
done, #17854 |
@Micket if I understand https://github.com/open-mpi/ompi/blob/9216ad4c49a16b3134d0dc47d8aa623f569d45ae/opal/mca/btl/smcuda/README.md correctly, |
Allow building Open MPI with --with-cuda=internal, by providing an
internal minimal cuda.h header file. This eliminate the CUDA
(build)dependency; as long as the runtime CUDA version is 8.0+,
libcuda.so will be dlopen'ed and used successfully.