You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The attached paper shows that the throughput of the application in SYCL is higher than that of the HIP program, but it does not explain the performance difference.
Unlocking performance portability on LUMI-G supercomputer:
A virtual screening case study 3648115.3648125.pdf
The text was updated successfully, but these errors were encountered:
5.1.1 Software stack. [...]
Moreover, we used the HIPIFY tool 4 to
automatically generate a HIP implementation from the CUDA one,
based on HIP 5.3. We have used the ROCm LLVM’s to perform a
code build of the HIP version on AMD GPUs
5.2 Single GPU performance portability [...]
Moreover, we include an automatically generated
HIP version for AMD GPUs, while for NVIDIA GPUs, we include a
hand-optimized CUDA version.
It doesn't sound like they made any effort to tune the generated HIP version. CUDA results for A100 are double that of AdaptiveCPP so there's a good chance that hand-optimized HIP could also outperform SYCL.
The attached paper shows that the throughput of the application in SYCL is higher than that of the HIP program, but it does not explain the performance difference.
Unlocking performance portability on LUMI-G supercomputer:
A virtual screening case study
3648115.3648125.pdf
The text was updated successfully, but these errors were encountered: