issues Search Results · repo:deepseek-ai/FlashMLA language:Cuda
Filter by
10 results
(54 ms)10 results
indeepseek-ai/FlashMLA (press backspace or delete to remove)In the file flash_fwd_mla_kernel.h, there are several double-buffering processes. In each of these processes, the target
offsets are either sK_offset / 8 or sK_offset, where sK_offset is equal to 576 * ...
ChengruiZhang
- 5
- Opened on Mar 11
- #67
Based on the Hopper architecture FlashMLA and FlashAttentionV2, the FlashMLA based on the Ampere architecture is
implemented. Without using double buffer optimization, the performance can achiev up to ...
pzhao-eng
- 1
- Opened on Mar 11
- #66
can I find the definition for the joint compression of K and V in this code?
houghtonweihu
- 1
- Opened on Mar 3
- #58
PR
https://github.com/deepseek-ai/FlashMLA/pull/54
Intro
Support FP8 WGMMA based on the async pipeline design of FlashMLA. The TransV part draws on the implementation of
SmemTranspose64x64 in Fa3. Currently, ...
endurehero
- 1
- Opened on Feb 28
- #56
great work and thanks for sharing it!
I am confused on the differences (computation wise) between this work and FA3 by TriDao. I noticed that FlashMLA doesn t
have the up projection matrix multiplies ...
MustafaFayez
- 1
- Opened on Feb 27
- #51
Hi all,
Thanks for your work! I wanna achieve the fp8 inference on v3 model, is there any hints on how to do that or any plans
to support it?
JoeyYoung
- 5
- Opened on Feb 25
- #44
1. Hopper异步拷贝强化
// 修改前
cute::cp_async_fence();
// 修改后
constexpr int kCpAsyncCount = 4; // 利用Hopper单周期4条cp.async
#pragma unroll
for (int i = 0; i kCpAsyncCount; ++i) {
cute::cp_async 0x80, kCpAsyncCount ...
arnewc
- 9
- Opened on Feb 24
- #26
yoyogis
- 15
- Opened on Feb 24
- #21
the block_size of traditional page attn is 16, used on head_size 128,make one page size 4KB。 here why to support 64
block_size。
WtDMaO
- Opened on Feb 24
- #3

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.