Hopper FP8 Blockwise Attention #1690

ipiszy · 2025-06-02T06:02:53Z

ipiszy
Jun 2, 2025
Collaborator

This is a discussion of fp8 blockwise scaling in FAv3. Please see this doc for detailed info. Code has been upstreamed to ipiszy/fp8_scaling_recipe.

Motivation

The existing quantization recipe supported by FAv3 FP8 attention kernel is per-KV head scaling. This approach has good kernel performance, however there are two potential drawbacks: 1) for extreme long context length, per-KV head scaling granularity might be too coarse-grained which yields unsatisfying numerics; 2) for long decode or multi-turn cases where KVs are appended dynamically, it's painful to update per-KV head scales dynamically. Blockwise-scaling is useful when static per-KV head scaling is not enough to satisfy numeric requirements.

Design

Check doc for detailed info.

Benchmark Results

Overall, fp8 attention kernel achieves down to 62% BF16 attention kernel latency for prefill and down to 52% BF16 latency for decode, without perf degradation under short context lengths. Check doc for detailed info.

Current Status

Current code only supports fixed seqlen. More tests (and bug fixes) are needed to support var-seq-len / pagedKV, etc.

biandangan · 2025-09-09T01:46:24Z

biandangan
Sep 9, 2025

Questions:

Does per-token scales need pre-computation before calling the fp8 attn for qkv?
The latency of per token is on par with per-head. How about the accuracy?

0 replies

XiaotaoChen · 2025-12-15T07:57:15Z

XiaotaoChen
Dec 15, 2025

@ipiszy Hi, I'm interested in fp8 attention, It can speed up attn calculation and save kv cache memory, which can enable longer context length or larger batch size. Can you share any accuracy info with fp8 attn?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hopper FP8 Blockwise Attention #1690

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Hopper FP8 Blockwise Attention #1690

Uh oh!

Uh oh!

ipiszy Jun 2, 2025 Collaborator

Motivation

Design

Benchmark Results

Current Status

Replies: 2 comments

Uh oh!

Uh oh!

biandangan Sep 9, 2025

Uh oh!

XiaotaoChen Dec 15, 2025

ipiszy
Jun 2, 2025
Collaborator

biandangan
Sep 9, 2025

XiaotaoChen
Dec 15, 2025