You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code in the repo https://github.com/JorgeG94/calum_performance_tool has the readme, but basically: hipcc -L/opt/rocm-5.4.3/lib -lhipblas --offload-arch=gfx90a performance.cpp ./a.out 36000 14400 36000 10 T T
Environment
Hardware
description
GPU
MI250x
CPU
AMD Optimized 3rd Gen EPYC
Software
version
ROCM
v5.4.0
The text was updated successfully, but these errors were encountered:
hipBLAS is just a wrapper library for rocBLAS/cuBLAS backends. rocBLAS then uses the Tensile library for calls to gemm. Since you're looking for better performance in dgemm, I think it will be best if I transfer this issue to the Tensile library where they can hopefully help you out. Performance tuning done there will be realized in rocBLAS and hipBLAS w/ AMD backend.
What is the expected behavior
What actually happens
How to reproduce
hipcc -L/opt/rocm-5.4.3/lib -lhipblas --offload-arch=gfx90a performance.cpp
./a.out 36000 14400 36000 10 T T
Environment
The text was updated successfully, but these errors were encountered: