Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor custom gemm heuristics #56

Merged
merged 8 commits into from
Jun 20, 2024
Merged

Conversation

gshtras
Copy link
Collaborator

@gshtras gshtras commented Jun 19, 2024

Moving custom skinny gemm heuristic before hipblas or rocblas solutions.
Disabling the now obsolete LLMM1 path which is fully covered by the new kernel

@gshtras gshtras requested a review from mawong-amd June 19, 2024 18:26
@gshtras gshtras requested a review from dllehr-amd June 19, 2024 18:32
Copy link

@mawong-amd mawong-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good assuming performance testing shows no regressions.

vllm/model_executor/layers/tuned_gemm.py Outdated Show resolved Hide resolved
weights.shape[0],
dtype=inp_view.dtype,
device='cuda')
_custom_C.wvSpltK(weights, inp_view, out, n, self.cu_count)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something that needs to be changed right now but we probably want to refactor this eventually so that the MP core count is done at the C++ level: IMO not good decomposition to have it here.

vllm/model_executor/layers/tuned_gemm.py Show resolved Hide resolved
@gshtras gshtras requested a review from mawong-amd June 20, 2024 17:09
Copy link

@mawong-amd mawong-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@gshtras gshtras merged commit 4460294 into main Jun 20, 2024
13 checks passed
@gshtras gshtras deleted the refactor_custom_gemm_heuristic branch June 20, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants