-
Couldn't load subscription status.
- Fork 65
6-H #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
6-H #2
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
6a7ffa7
HOST NVSHMEM EXAMPLE
f7b0db0
boundary conditions is not working correctly yet
cbc3d0b
Fixed rounding error due to not correctly waiting for the compute ker…
2b0f180
First draft of the MPI Overalp version to be tested
6789073
Tested and working properly, also added Instructions.md and copy.mk
08b72c8
first draft of NCCL version, needs to be tested for correctness
495b91f
Merge branch 'main' of https://github.com/simongdg/tutorial-multi-gpu…
f599f89
NCCL and host-side NVSHMEM with overlap first version, both need to b…
bceb8fa
6-H Pull request edits have been implemented, 8-H NCCL is also ready …
2ad3188
Update 8-H_NCCL_NVSHMEM/NCCL/Instructions.md
simongdg f3aa824
Update 8-H_NCCL_NVSHMEM/NCCL/copy.mk
simongdg 65f4a4a
Update 8-H_NCCL_NVSHMEM/NCCL/jacobi.cpp
simongdg 87fa989
Update 6-H_Overlap_Communication_and_Computation_MPI/jacobi.cpp
simongdg b29b6c4
Update 8-H_NCCL_NVSHMEM/NCCL/jacobi.cpp
simongdg 8371ce4
Addressing pull request comments on TODOs spacing and instructions fo…
c8a21dc
Merge branch 'main' of https://github.com/simongdg/tutorial-multi-gpu…
734da80
NCCL warmup has been added as a TODO, and added to the instructions
78a3f92
Update 6-H_Overlap_Communication_and_Computation_MPI/jacobi.cpp
simongdg 2764eed
Update 6-H_Overlap_Communication_and_Computation_MPI/jacobi.cpp
simongdg c96a255
Update 6-H_Overlap_Communication_and_Computation_MPI/jacobi.cpp
simongdg ce10df8
Update 6-H_Overlap_Communication_and_Computation_MPI/jacobi.cpp
simongdg 2a4de59
Update 8-H_NCCL_NVSHMEM/NCCL/jacobi.cpp
simongdg d691b6c
Update 8-H_NCCL_NVSHMEM/NCCL/jacobi.cpp
simongdg 21238b8
Update 8-H_NCCL_NVSHMEM/NCCL/jacobi.cpp
simongdg af0dc3d
Update 8-H_NCCL_NVSHMEM/NCCL/jacobi.cpp
simongdg 5361260
Fixed indentation on issue on 6-H and added jacobi.cpp to the copy.mk…
62bef40
Update 8-H_NCCL_NVSHMEM/NCCL/Instructions.md
simongdg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
46 changes: 46 additions & 0 deletions
46
6-H_Overlap_Communication_and_Computation_MPI/Instructions.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,46 @@ | ||
| # SC21 Tutorial: Efficient Distributed GPU Programming for Exascale | ||
|
|
||
| - Time: Sunday, 14 November 2021 8AM - 5PM CST | ||
| - Location: *online* | ||
| - Program Link: https://sc21.supercomputing.org/presentation/?id=tut138&sess=sess188 | ||
|
|
||
|
|
||
| ## Hands-On 6: Overlap Communication and Computation with MPI | ||
|
|
||
| ### Task 0: Profile the non-Overlap MPI-CUDA version of the code using Nsight Systems to discover areas of possible compute/communication overlap | ||
|
|
||
| #### Description | ||
| The purpose of this task is to use the Nsight System profiler to profile the starting point version non-Overlap MPI jacobi solver. The objective is to become familiar in navigating the GUI identify possible areas to overlap computation and communication. | ||
|
|
||
| - STEPS TO BE ADDED HERE | ||
|
|
||
| ### Task 1: Overlap Communication and Computation using high priority streams and hide launch time for halo processing kernels | ||
|
|
||
| #### Description | ||
|
|
||
| The purpose of this task is to overlap computation and communication based on the profiling done during the previus task. The starting point of this task is the non-Overlap MPI variant of the jacobi solver. You need to work on `TODOs` in `jacobi.cu`: | ||
|
|
||
| - Initialize a priority range to be used by the CUDA streams | ||
| - Create new top and bottom CUDA streams and corresponding CUDA events | ||
| - Initialize all streams using priorities | ||
| - Modify the original jacobi kernel launch to not compute the top and bottom regions | ||
| - Launch additional jacobi kernels for the top and bottom regions using the high-priority streams | ||
| - Wait on both top and bottom streams when calculating the norm | ||
| - Synchronize top and bottom streams before applying the periodic boundary conditions using MPI | ||
| - Destroy the additional cuda streams and events before ending the application | ||
|
|
||
| Compile with | ||
|
|
||
| ``` {.bash} | ||
| make | ||
| ``` | ||
|
|
||
| Submit your compiled application to the batch system with | ||
|
|
||
| ``` {.bash} | ||
| make run | ||
| ``` | ||
|
|
||
| Study the performance by glimpsing at the profile generated with | ||
| `make profile`. For `make run` and `make profile` the environment variable `NP` can be set to change the number of processes. | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| # Copyright (c) 2017-2018, NVIDIA CORPORATION. All rights reserved. | ||
| NP ?= 1 | ||
| NVCC=nvcc | ||
| MPICXX=mpicxx | ||
| JSC_SUBMIT_CMD ?= srun --gres=gpu:4 --ntasks-per-node 4 | ||
| CUDA_HOME ?= /usr/local/cuda | ||
| GENCODE_SM30 := -gencode arch=compute_30,code=sm_30 | ||
| GENCODE_SM35 := -gencode arch=compute_35,code=sm_35 | ||
| GENCODE_SM37 := -gencode arch=compute_37,code=sm_37 | ||
| GENCODE_SM50 := -gencode arch=compute_50,code=sm_50 | ||
| GENCODE_SM52 := -gencode arch=compute_52,code=sm_52 | ||
| GENCODE_SM60 := -gencode arch=compute_60,code=sm_60 | ||
| GENCODE_SM70 := -gencode arch=compute_70,code=sm_70 | ||
| GENCODE_SM80 := -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 | ||
| GENCODE_FLAGS := $(GENCODE_SM70) $(GENCODE_SM80) | ||
| ifdef DISABLE_CUB | ||
| NVCC_FLAGS = -Xptxas --optimize-float-atomics | ||
| else | ||
| NVCC_FLAGS = -DHAVE_CUB | ||
| endif | ||
| NVCC_FLAGS += -lineinfo $(GENCODE_FLAGS) -std=c++14 | ||
| MPICXX_FLAGS = -DUSE_NVTX -I$(CUDA_HOME)/include -std=c++14 | ||
| LD_FLAGS = -L$(CUDA_HOME)/lib64 -lcudart -lnvToolsExt | ||
| jacobi: Makefile jacobi.cpp jacobi_kernels.o | ||
| $(MPICXX) $(MPICXX_FLAGS) jacobi.cpp jacobi_kernels.o $(LD_FLAGS) -o jacobi | ||
|
|
||
| jacobi_kernels.o: Makefile jacobi_kernels.cu | ||
| $(NVCC) $(NVCC_FLAGS) jacobi_kernels.cu -c | ||
|
|
||
| .PHONY.: clean | ||
| clean: | ||
| rm -f jacobi jacobi_kernels.o *.nsys-rep jacobi.*.compute-sanitizer.log | ||
|
|
||
| sanitize: jacobi | ||
| $(JSC_SUBMIT_CMD) -n $(NP) compute-sanitizer --log-file jacobi.%q{SLURM_PROCID}.compute-sanitizer.log ./jacobi -niter 10 | ||
|
|
||
| run: jacobi | ||
| $(JSC_SUBMIT_CMD) -n $(NP) ./jacobi | ||
|
|
||
| profile: jacobi | ||
| $(JSC_SUBMIT_CMD) -n $(NP) nsys profile --trace=mpi,cuda,nvtx -o jacobi.%q{SLURM_PROCID} ./jacobi -niter 10 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| #!/usr/bin/make -f | ||
| # Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved. | ||
| TASKDIR = ../../tasks/6-H_Overlap_Communication_and_Computation_MPI/ | ||
| SOLUTIONDIR = ../../solutions/6-H_Overlap_Communication_and_Computation_MPI | ||
|
|
||
| PROCESSFILES = jacobi.cu | ||
| COPYFILES = Makefile Instructions.ipynb Instructions.md | ||
|
|
||
|
|
||
| TASKPROCCESFILES = $(addprefix $(TASKDIR)/,$(PROCESSFILES)) | ||
| TASKCOPYFILES = $(addprefix $(TASKDIR)/,$(COPYFILES)) | ||
| SOLUTIONPROCCESFILES = $(addprefix $(SOLUTIONDIR)/,$(PROCESSFILES)) | ||
| SOLUTIONCOPYFILES = $(addprefix $(SOLUTIONDIR)/,$(COPYFILES)) | ||
|
|
||
| .PHONY: all task | ||
| all: task | ||
| task: ${TASKPROCCESFILES} ${TASKCOPYFILES} ${SOLUTIONPROCCESFILES} ${SOLUTIONCOPYFILES} | ||
|
|
||
|
|
||
| ${TASKPROCCESFILES}: $(PROCESSFILES) | ||
| mkdir -p $(TASKDIR)/ | ||
| cppp -USOLUTION $(notdir $@) $@ | ||
|
|
||
| ${SOLUTIONPROCCESFILES}: $(PROCESSFILES) | ||
| mkdir -p $(SOLUTIONDIR)/ | ||
| cppp -DSOLUTION $(notdir $@) $@ | ||
|
|
||
|
|
||
| ${TASKCOPYFILES}: $(COPYFILES) | ||
| mkdir -p $(TASKDIR)/ | ||
| cp $(notdir $@) $@ | ||
|
|
||
| ${SOLUTIONCOPYFILES}: $(COPYFILES) | ||
| mkdir -p $(SOLUTIONDIR)/ | ||
| cp $(notdir $@) $@ | ||
|
|
||
| %.ipynb: %.md | ||
| pandoc $< -o $@ | ||
| # add metadata so this is seen as python | ||
| jq -s '.[0] * .[1]' $@ ../template.json | sponge $@ |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'm writing up some draft steps to be included here, but I think it makes sense to merge your PR first so that we can build on top of it.