Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in throughput for an application in HIP/SYCL #3441

Closed
jinz2014 opened this issue Apr 9, 2024 · 1 comment
Closed

Differences in throughput for an application in HIP/SYCL #3441

jinz2014 opened this issue Apr 9, 2024 · 1 comment

Comments

@jinz2014
Copy link

jinz2014 commented Apr 9, 2024

The attached paper shows that the throughput of the application in SYCL is higher than that of the HIP program, but it does not explain the performance difference.

Unlocking performance portability on LUMI-G supercomputer:
A virtual screening case study
3648115.3648125.pdf

@bdenhollander
Copy link

5.1.1 Software stack. [...]
Moreover, we used the HIPIFY tool 4 to
automatically generate a HIP implementation from the CUDA one,
based on HIP 5.3. We have used the ROCm LLVM’s to perform a
code build of the HIP version on AMD GPUs

5.2 Single GPU performance portability [...]
Moreover, we include an automatically generated
HIP version for AMD GPUs, while for NVIDIA GPUs, we include a
hand-optimized CUDA version.

It doesn't sound like they made any effort to tune the generated HIP version. CUDA results for A100 are double that of AdaptiveCPP so there's a good chance that hand-optimized HIP could also outperform SYCL.

@jinz2014 jinz2014 closed this as completed May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants