Skip to content

issues Search Results · repo:deepseek-ai/FlashMLA language:Cuda

Filter by

10 results
 (54 ms)

10 results

indeepseek-ai/FlashMLA (press backspace or delete to remove)

In the file flash_fwd_mla_kernel.h, there are several double-buffering processes. In each of these processes, the target offsets are either sK_offset / 8 or sK_offset, where sK_offset is equal to 576 * ...
  • ChengruiZhang
  • 5
  • Opened 
    on Mar 11
  • #67

Based on the Hopper architecture FlashMLA and FlashAttentionV2, the FlashMLA based on the Ampere architecture is implemented. Without using double buffer optimization, the performance can achiev up to ...
  • pzhao-eng
  • 1
  • Opened 
    on Mar 11
  • #66

can I find the definition for the joint compression of K and V in this code?
  • houghtonweihu
  • 1
  • Opened 
    on Mar 3
  • #58

PR https://github.com/deepseek-ai/FlashMLA/pull/54 Intro Support FP8 WGMMA based on the async pipeline design of FlashMLA. The TransV part draws on the implementation of SmemTranspose64x64 in Fa3. Currently, ...
  • endurehero
  • 1
  • Opened 
    on Feb 28
  • #56

great work and thanks for sharing it! I am confused on the differences (computation wise) between this work and FA3 by TriDao. I noticed that FlashMLA doesn t have the up projection matrix multiplies ...
  • MustafaFayez
  • 1
  • Opened 
    on Feb 27
  • #51

有开源代码的技术交流群吗?一起学习交流优化
  • shatealaboxiaowang
  • Opened 
    on Feb 27
  • #47

Hi all, Thanks for your work! I wanna achieve the fp8 inference on v3 model, is there any hints on how to do that or any plans to support it?
  • JoeyYoung
  • 5
  • Opened 
    on Feb 25
  • #44

1. Hopper异步拷贝强化 // 修改前 cute::cp_async_fence(); // 修改后 constexpr int kCpAsyncCount = 4; // 利用Hopper单周期4条cp.async #pragma unroll for (int i = 0; i kCpAsyncCount; ++i) { cute::cp_async 0x80, kCpAsyncCount ...
  • arnewc
  • 9
  • Opened 
    on Feb 24
  • #26

the block_size of traditional page attn is 16, used on head_size 128,make one page size 4KB。 here why to support 64 block_size。
  • WtDMaO
  • Opened 
    on Feb 24
  • #3
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub