[Feature][AMD] Adding fp8 Gemm Computation #31

charlifu · 2024-06-03T21:21:09Z

This PR adds fp8 gemm computation support on AMD GPUs.

Moved convert_fp8 kernels to vllm.ops from vllm.cache_ops.
Added Fp8RocmConfig and Fp8RocmLinearMethod for creating fp8 weights and conducting fp8 gemm computation.
Added pytorch C++ ops warping hipblaslt kernels for fp8 gemm.

…charlifu/fp8

…lifu/fp8

mawong-amd

Thanks for your hard work on this! I've given the PR a very quick look and given a few comments mostly relating to style/structure. Will do a more in-depth read once it's a bit more mature.

mawong-amd · 2024-06-04T06:36:19Z

csrc/pybind.cpp

+  ops.def("fp8_gemm", &fp8_gemm, "fp8 GEMM");
+  ops.def("fp8_gemm_16", &fp8_gemm_16, "fp8 GEMM");


Nit: better description explaining the difference between these two.

Sure, will do.

csrc/quantization/fp8/amd/gemm_kernel.cu

csrc/quantization/fp8/amd/quant_utils.cuh

csrc/cache_kernels.cu

csrc/ops.h

csrc/pybind.cpp

vllm/model_executor/layers/quantization/fp8_rocm.py

vllm/model_executor/models/llama.py

HaiShaw

More changes needed:
7. code style and line formats - vLLM required google style, it will trigger many errors as current standing, e.g. keep line break at 80 max, etc. Using our prior merged code as an example to be safe.
8. fp8_gemm - please consider to add bias support.

csrc/quantization/fp8/amd/gemm_kernel.cu

HaiShaw · 2024-06-26T17:15:51Z

Do we have Dockerfile updated to reflect the changes needed for libraries - hipblas, etc.?
Providing a reference public docker image (on MI300x) can be an alternative, but we need to describe it in writeup.

gshtras · 2024-06-26T17:34:30Z

Do we have Dockerfile updated to reflect the changes needed for libraries - hipblas, etc.? Providing a reference public docker image (on MI300x) can be an alternative, but we need to describe it in writeup.

Dockerfile.rocm in our fork has all that's needed. Dockerfile.rocm in upstream should also work, but I doubt it was tested for fp8

HaiShaw

Please rename csrc/quantization/fp8/amd/gemm_kernel.cu, it isn't a cuda kernel file, using .cu isn't appropriate.

gshtras · 2024-06-28T14:24:04Z

Please rename csrc/quantization/fp8/amd/gemm_kernel.cu, it isn't a cuda kernel file, using .cu isn't appropriate.

It uses the generic cublas API, and therefore:

Is CUDA compatible
Needs to be hippified
Needs to have this extension for CMake to work correctly, and apply hipifier

…ss scaling

HaiShaw

LGTM for now, we can address remaining open issues continuously.

Once merged to fp8-gemm, we can put obvious and build fixes, etc. on fp8-gemm directly, a bit more involved can use this or other branches too targeting fp8-gemm, until upcoming UPSTREAM from fp8-gemm is finally merged.

charlifu and others added 25 commits May 14, 2024 15:13

adding rocm fp8

f43b42f

Merge branch 'vllm-project:main' into charlifu/fp8

cb4083b

Merge branch 'vllm-project:main' into charlifu/fp8

43b4a00

Merge branch 'vllm-project:main' into charlifu/fp8

c3e3967

Merge branch 'charlifu/fp8' of https://github.com/charlifu/vllm into …

d1a9067

…charlifu/fp8

Merge branch 'vllm-project:main' into charlifu/fp8

108565f

Merge branch 'vllm-project:main' into charlifu/fp8

8eedfc1

Merge branch 'vllm-project:main' into charlifu/fp8

f290142

Merge branch 'vllm-project:main' into charlifu/fp8

0337638

Merge branch 'vllm-project:main' into charlifu/fp8

44f7f7c

fp8 computation

8ba26d2

Merge branch 'vllm-project:main' into charlifu/fp8

45c3a69

Using convert_fp8 kernel

81fbd0a

delete convert.cu

a16306d

Merge branch 'vllm-project:main' into charlifu/fp8

1f8cba4

Merge branch 'vllm-project:main' into charlifu/fp8

d5a7cf7

Merge branch 'charlifu/fp8' of https://github.com/ROCm/vllm into char…

400ab5e

…lifu/fp8

Merge branch 'vllm-project:main' into charlifu/fp8

8b92457

clean up

966d029

clean up

6f650e8

Merge branch 'vllm-project:main' into charlifu/fp8

7e84ddf

remove extra kernels

d6ddc9f

remove int8 -> fp8 convert

d4aea7f

fix naming

f7b4e21

fix typo

a1fa17e

mawong-amd reviewed Jun 4, 2024

View reviewed changes

charlifu added 4 commits June 4, 2024 16:23

clean up

053c7b8

add compilation guard

5c35978

add convert_fp8 in cache_ops

3942925

Merge remote-tracking branch 'origin/fp8-gemm' into charlifu/fp8

3cc9510

gshtras and others added 10 commits June 21, 2024 15:03

Fixed scaled single float->fp8 conversion

27bdc9e

Simplified conversion kernel

e5a37bb

Merge remote-tracking branch 'origin/fp8-gemm' into charlifu/fp8

b94052a

Fix build

cf6b682

chang weights to weight

59dbff0

fix typo

3909774

fix fp8_mm return None

65ec024

Added bf16 output support for gemms

5b236b5

Remove unused definition: max_workspace_size

c678326

Some indentation fixes

749abd3

HaiShaw requested changes Jun 24, 2024

View reviewed changes

csrc/quantization/fp8/amd/gemm_kernel.cu Outdated Show resolved Hide resolved

gshtras and others added 4 commits June 24, 2024 15:43

clang-format

f7c15b9

Deriving out dtype from the model type. Formatting

c05c1e8

Handle bias

530f1a8

add README for quark

9b6865d

gshtras and others added 6 commits June 27, 2024 17:00

Workaround for hipblaslt integet overflow issue

5d45e44

Config filename change

a9a33c7

Formatting

1a3e7cf

add example for fp8fnuz_config.json file

f92fe7c

Fixing numeric celling for float8_e4m3fnuz

c337f53

Add TODOs to hipblasLt and its GEMM use code

573b175

HaiShaw requested changes Jun 28, 2024

View reviewed changes

gshtras and others added 3 commits June 28, 2024 14:28

Merge remote-tracking branch 'origin/fp8-gemm' into charlifu/fp8

3a04b57

Add TODOs to quant_utils

2976580

Add TODOs to determine GEMM output dtype and conditions to apply egre…

d550831

…ss scaling

HaiShaw approved these changes Jun 28, 2024

View reviewed changes

HaiShaw merged commit 290e4ab into fp8-gemm Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature][AMD] Adding fp8 Gemm Computation #31

[Feature][AMD] Adding fp8 Gemm Computation #31

charlifu commented Jun 3, 2024 •

edited

Loading

mawong-amd left a comment

mawong-amd Jun 4, 2024

charlifu Jun 4, 2024

HaiShaw left a comment •

edited

Loading

HaiShaw commented Jun 26, 2024

gshtras commented Jun 26, 2024

HaiShaw left a comment

gshtras commented Jun 28, 2024

HaiShaw left a comment •

edited

Loading

		ops.def("fp8_gemm", &fp8_gemm, "fp8 GEMM");
		ops.def("fp8_gemm_16", &fp8_gemm_16, "fp8 GEMM");

[Feature][AMD] Adding fp8 Gemm Computation #31

[Feature][AMD] Adding fp8 Gemm Computation #31

Conversation

charlifu commented Jun 3, 2024 • edited Loading

mawong-amd left a comment

Choose a reason for hiding this comment

mawong-amd Jun 4, 2024

Choose a reason for hiding this comment

charlifu Jun 4, 2024

Choose a reason for hiding this comment

HaiShaw left a comment • edited Loading

Choose a reason for hiding this comment

HaiShaw commented Jun 26, 2024

gshtras commented Jun 26, 2024

HaiShaw left a comment

Choose a reason for hiding this comment

gshtras commented Jun 28, 2024

HaiShaw left a comment • edited Loading

Choose a reason for hiding this comment

charlifu commented Jun 3, 2024 •

edited

Loading

HaiShaw left a comment •

edited

Loading

HaiShaw left a comment •

edited

Loading