-
Notifications
You must be signed in to change notification settings - Fork 35
OpenMP offload
Welcome to the miniqmc for OpenMP offload wiki!
Check out OMP_offload
branch
git co OMP_offload
See build options in miniQMC How-to Guides.
We introduce a new option ENABLE_OFFLOAD
in the current CMake setting to turn on/off offloading.
-DENABLE_OFFLOAD=ON # offload to accelerators like GPU
-DENABLE_OFFLOAD=OFF # default, CPU only
OFFLOAD_TARGET
can be used to select a offload target if multiple targets are supported by the compiler, for example Clang and GNU.
Offload feature is currently implemented on miniqmc
miniapp.
It accepts command line arguments -g, -w, -a, -m, -n
-g adjusts supercell size
-w number of walkers. Equal to the number of CPU threads if not specified.
-a tiling (cache blocking) size. Equal to the number of splines if not specified.
-m spline mesh "px py pz"
-n number of iterations
The old check_spo is renamed as check_spo_batched. The following option is only available with check_spo_batched
-f avoid transfer back data for checking. Must be used when measuring performance.
OMP_NUM_THREADS=10 ./bin/miniqmc -g "2 2 1"
Update on Nov 17th 2019
Last verified on 16.1.1-5
cmake -DCMAKE_CXX_COMPILER=xlC_r -DENABLE_OFFLOAD=ON ..
With old version of CMake (<3.11), XL is identified as Clang. The following workaround solves the issue
cmake -DCMAKE_CXX_COMPILER=xlC_r -DCMAKE_CXX_COMPILER_ID='XL' -DENABLE_OFFLOAD=1 ..
Last verified on 16
# NVIDIA
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=ON -D QMC_GPU_ARCHS=sm_80 ..
# AMD
cmake -D CMAKE_CXX_COMPILER=clang++ -D ENABLE_OFFLOAD=ON -D QMC_GPU_ARCHS=gfx906 ..
not needed since LLVM 15.-D USE_OBJECT_TARGET=ON
is used to workaround static linking issue.
Last verified on beta08
cmake -D CMAKE_CXX_COMPILER=icpx -D ENABLE_OFFLOAD=ON -D OFFLOAD_TARGET=spir64 ..
On some systems, forcing LIBOMPTARGET_PLUGIN=OPENCL is needed at runtime.
Last verified on 17.0-0
cmake -D CMAKE_CXX_COMPILER=clang++ \
-D ENABLE_OFFLOAD=ON \
-D QMC_GPU_ARCHS=gfx906 ..
Last verified on 13 develop
cmake -D CMAKE_CXX_COMPILER=g++ -D ENABLE_OFFLOAD=ON ..
Last verified on 14. There is no need to load any architectural module like craype-accel-amd-gfx90a
.
cmake -D CMAKE_CXX_COMPILER=crayCC \
-D ENABLE_OFFLOAD=ON \
-D OFFLOAD_TARGET=amdgcn-amd-amdhsa \
-D OFFLOAD_ARCH=gfx90a \
-D QMC_MIXED_PRECISION=ON ..
cmake -DCMAKE_CXX_COMPILER=nvc++ -DENABLE_OFFLOAD=ON -DQMC_GPU_ARCHS=sm_80 -DQMC_MIXED_PRECISION=ON -DLAPACK_LIBRARIES="-llapack -lblas" -DCMAKE_EXE_LINKER_FLAGS=-pgf90libs ..
Compiler | Clang 12.0.0rc3 | AOMP 11.12-0 | XL 16.1.1-5 | OneAPI 2021.2.0 | Cray 11.0.2 | GCC 11dev 20210315 | NVHPC 21.02 |
---|---|---|---|---|---|---|---|
device | NVIDIA | AMD | NVIDIA | Intel | NVIDIA | NVIDIA | NVIDIA |
math header conflict | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
complex arithmetic | Pass | Pass | Pass | Pass | Fail | Pass | Pass |
math linker error | Pass | Pass | Pass | Pass | Pass | Pass | Fail |
static linking | Fail | Pass | Pass | Pass | Pass | Pass | Pass |
Async tasking | Pass | FC | Pass | FC | FC | FC | Fail |
multiple streams | Pass | Pass | Pass | FC | FC | FC | Pass |
check_spo | Pass | Pass | Pass | Pass(R) | Pass | Pass | Fail |
check_spo_batched | Pass | Pass | Pass | Pass(R) | Pass | Pass | Fail |
miniqmc_sync_move | Pass | Pass | Pass | Pass | Pass | Pass | Pass |
Pass the intended feature is supported and runs corrected.
Fail can be in compile, link and run or incorrect results.
FC functionally correct, run with correct results.
(R) regression in the current release.