Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hipblasdgemm not getting close to peak #1705

Open
JorgeG94 opened this issue Apr 6, 2023 · 3 comments
Open

hipblasdgemm not getting close to peak #1705

JorgeG94 opened this issue Apr 6, 2023 · 3 comments

Comments

@JorgeG94
Copy link

JorgeG94 commented Apr 6, 2023

What is the expected behavior

  • I would expect a dgemm of sizeable input to achieve close to the 47.9 TFLOP/s

What actually happens

How to reproduce

Environment

Hardware description
GPU MI250x
CPU AMD Optimized 3rd Gen EPYC
Software version
ROCM v5.4.0
@JorgeG94
Copy link
Author

JorgeG94 commented Apr 6, 2023

I've tried larger sizes and at some point the code just breaks without ever breaking the 40 TFLOP barrier

@daineAMD
Copy link
Contributor

daineAMD commented Apr 6, 2023

Hi @JorgeG94, thanks for opening this issue.

hipBLAS is just a wrapper library for rocBLAS/cuBLAS backends. rocBLAS then uses the Tensile library for calls to gemm. Since you're looking for better performance in dgemm, I think it will be best if I transfer this issue to the Tensile library where they can hopefully help you out. Performance tuning done there will be realized in rocBLAS and hipBLAS w/ AMD backend.

Thanks,
Daine

@daineAMD daineAMD pinned this issue Apr 6, 2023
@daineAMD daineAMD unpinned this issue Apr 6, 2023
@daineAMD daineAMD transferred this issue from ROCm/hipBLAS Apr 6, 2023
@nakajee
Copy link
Contributor

nakajee commented Apr 11, 2023

I will check this on my side.
Does the performance drop happen only with this size?
Have you checked other sizes and/or orientations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants