Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][AMD] Adding fp8 Gemm Computation #31

Merged
merged 73 commits into from
Jun 28, 2024
Merged

[Feature][AMD] Adding fp8 Gemm Computation #31

merged 73 commits into from
Jun 28, 2024

Conversation

charlifu
Copy link

@charlifu charlifu commented Jun 3, 2024

This PR adds fp8 gemm computation support on AMD GPUs.

  • Moved convert_fp8 kernels to vllm.ops from vllm.cache_ops.
  • Added Fp8RocmConfig and Fp8RocmLinearMethod for creating fp8 weights and conducting fp8 gemm computation.
  • Added pytorch C++ ops warping hipblaslt kernels for fp8 gemm.

Copy link

@mawong-amd mawong-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your hard work on this! I've given the PR a very quick look and given a few comments mostly relating to style/structure. Will do a more in-depth read once it's a bit more mature.

csrc/pybind.cpp Outdated
Comment on lines 73 to 74
ops.def("fp8_gemm", &fp8_gemm, "fp8 GEMM");
ops.def("fp8_gemm_16", &fp8_gemm_16, "fp8 GEMM");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: better description explaining the difference between these two.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do.

csrc/quantization/fp8/amd/gemm_kernel.cu Outdated Show resolved Hide resolved
csrc/quantization/fp8/amd/gemm_kernel.cu Outdated Show resolved Hide resolved
csrc/quantization/fp8/amd/gemm_kernel.cu Outdated Show resolved Hide resolved
csrc/quantization/fp8/amd/quant_utils.cuh Show resolved Hide resolved
csrc/cache_kernels.cu Outdated Show resolved Hide resolved
csrc/ops.h Outdated Show resolved Hide resolved
csrc/pybind.cpp Outdated Show resolved Hide resolved
vllm/model_executor/layers/quantization/fp8_rocm.py Outdated Show resolved Hide resolved
vllm/model_executor/models/llama.py Outdated Show resolved Hide resolved
Copy link

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More changes needed:
7. code style and line formats - vLLM required google style, it will trigger many errors as current standing, e.g. keep line break at 80 max, etc. Using our prior merged code as an example to be safe.
8. fp8_gemm - please consider to add bias support.

csrc/quantization/fp8/amd/gemm_kernel.cu Outdated Show resolved Hide resolved
@HaiShaw
Copy link

HaiShaw commented Jun 26, 2024

Do we have Dockerfile updated to reflect the changes needed for libraries - hipblas, etc.?
Providing a reference public docker image (on MI300x) can be an alternative, but we need to describe it in writeup.

@gshtras
Copy link

gshtras commented Jun 26, 2024

Do we have Dockerfile updated to reflect the changes needed for libraries - hipblas, etc.? Providing a reference public docker image (on MI300x) can be an alternative, but we need to describe it in writeup.

Dockerfile.rocm in our fork has all that's needed. Dockerfile.rocm in upstream should also work, but I doubt it was tested for fp8

Copy link

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename csrc/quantization/fp8/amd/gemm_kernel.cu, it isn't a cuda kernel file, using .cu isn't appropriate.

@gshtras
Copy link

gshtras commented Jun 28, 2024

Please rename csrc/quantization/fp8/amd/gemm_kernel.cu, it isn't a cuda kernel file, using .cu isn't appropriate.

It uses the generic cublas API, and therefore:

  1. Is CUDA compatible
  2. Needs to be hippified
  3. Needs to have this extension for CMake to work correctly, and apply hipifier

Copy link

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for now, we can address remaining open issues continuously.

Once merged to fp8-gemm, we can put obvious and build fixes, etc. on fp8-gemm directly, a bit more involved can use this or other branches too targeting fp8-gemm, until upcoming UPSTREAM from fp8-gemm is finally merged.

@HaiShaw HaiShaw merged commit 290e4ab into fp8-gemm Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants