Skip to content

[V1] feat: support fp8 kv on ampere through flashinfer#1440

Merged
AlpinDale merged 1 commit into
mainfrom
flashinfer-fp8-kv
Aug 24, 2025
Merged

[V1] feat: support fp8 kv on ampere through flashinfer#1440
AlpinDale merged 1 commit into
mainfrom
flashinfer-fp8-kv

Conversation

@AlpinDale

@AlpinDale AlpinDale commented Aug 24, 2025

Copy link
Copy Markdown
Member

Works with prefix caching and chunked prefill. Tested on A100:

Need flashinfer first:

pip install -U flashinfer-python

FI uses JIT compilation for its CUDA kernels, so the first request will take a few minutes to complete.

APHRODITE_ATTENTION_BACKEND="FLASHINFER" aphrodite run Qwen/Qwen3-0.6B \
    --kv-cache-dtype fp8

@AlpinDale AlpinDale merged commit bca9103 into main Aug 24, 2025
0 of 4 checks passed
@AlpinDale AlpinDale deleted the flashinfer-fp8-kv branch August 24, 2025 21:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant