Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemm performance downgrade for small size M and big size N&K #520

Open
abcdrm opened this issue Jan 10, 2024 · 1 comment
Open

gemm performance downgrade for small size M and big size N&K #520

abcdrm opened this issue Jan 10, 2024 · 1 comment

Comments

@abcdrm
Copy link

abcdrm commented Jan 10, 2024

Hi, I run ClBlast Hgemm and Hgemv on qualcomm adreno 730 GPU, the matrix shape is M = 8, N = 3200, K = 3200.
Hgemm took 8.05401 ms(average of 1000 times run) to finish, which is much slower than running 8 times Hgemv in a for loops(0.500931 ms per loop, total about 4 ms).
Here is code calling gemm and gemv:
CLBlastHgemm(CLBlastLayout::CLBlastLayoutRowMajor, CLBlastTranspose::CLBlastTransposeNo, CLBlastTranspose::CLBlastTransposeNo, M, N, K, alpha, A_mat(), 0, K, B_mat(), 0, N, beta, C_mat(), 0, N, &command_queue(), nullptr);

CLBlastHgemv(CLBlastLayout::CLBlastLayoutRowMajor, CLBlastTranspose::CLBlastTransposeYes, K, N, alpha, B_mat(), b_offset * i, N, A_mat(), 0, 1, beta, C_mat(), 0, 1, &command_queue(), nullptr);

@CNugteren
Copy link
Owner

Thank you for reporting this. Yes, it can be that for weird shapes running GEMV multiple times is faster than running GEMM once. I'm not sure there is much we can do here, since the optimization space is vast.

You should try to run the tuner specifically for these matrix sizes, but most likely it won't be faster than your current GEMV solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants