-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][AMD] Adding fp8 Gemm Computation #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work on this! I've given the PR a very quick look and given a few comments mostly relating to style/structure. Will do a more in-depth read once it's a bit more mature.
csrc/pybind.cpp
Outdated
ops.def("fp8_gemm", &fp8_gemm, "fp8 GEMM"); | ||
ops.def("fp8_gemm_16", &fp8_gemm_16, "fp8 GEMM"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: better description explaining the difference between these two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More changes needed:
7. code style and line formats - vLLM required google style, it will trigger many errors as current standing, e.g. keep line break at 80 max, etc. Using our prior merged code as an example to be safe.
8. fp8_gemm - please consider to add bias support.
Do we have Dockerfile updated to reflect the changes needed for libraries - hipblas, etc.? |
Dockerfile.rocm in our fork has all that's needed. Dockerfile.rocm in upstream should also work, but I doubt it was tested for fp8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rename csrc/quantization/fp8/amd/gemm_kernel.cu
, it isn't a cuda kernel file, using .cu
isn't appropriate.
It uses the generic cublas API, and therefore:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now, we can address remaining open issues continuously.
Once merged to fp8-gemm
, we can put obvious and build fixes, etc. on fp8-gemm
directly, a bit more involved can use this or other branches too targeting fp8-gemm
, until upcoming UPSTREAM from fp8-gemm
is finally merged.
This PR adds fp8 gemm computation support on AMD GPUs.
vllm.ops
fromvllm.cache_ops
.Fp8RocmConfig
andFp8RocmLinearMethod
for creating fp8 weights and conducting fp8 gemm computation.