Skip to content

issues Search Results · repo:deepseek-ai/FlashMLA language:C++

Filter by

42 results
 (78 ms)

42 results

indeepseek-ai/FlashMLA (press backspace or delete to remove)

https://github.com/deepseek-ai/FlashMLA/blob/b31bfe72a83ea205467b3271a5845440a03ed7cb/csrc/flash_api.cpp#L184 Hi all, Just wondering why the shape of O_accum and LSE_accum change from [numsplit, batch, ...
  • mingyangHao
  • Opened 
    on Apr 7
  • #70

@sijiac Why add permanent false condition for determine? https://github.com/deepseek-ai/FlashMLA/blob/b31bfe72a83ea205467b3271a5845440a03ed7cb/csrc/flash_fwd_mla_kernel.h#L528
  • LearnerInGithub
  • 1
  • Opened 
    on Mar 6
  • #64

@sijiac Hello everyone! I want to raise a question about the usage of CUDA qualifier launch_bounds. In CUDA document, launch_bounds( ) only has 2 parameters: maxThreadsPerBlock and minBlocksPerMultiprocessor. ...
  • LearnerInGithub
  • 2
  • Opened 
    on Mar 6
  • #63

Hope it helps someone https://www.youtube.com/watch?v=0VLAoVGf_74
  • leo-smi
  • Opened 
    on Mar 6
  • #62

I read the code of flash MLA, and I have some questions: 1. why not use tma to load QK, but use the SM80 copy_async. 2. To store data from register to global memory, it uses shared memory to change ...
  • Idonthaveaname-wq
  • 1
  • Opened 
    on Mar 5
  • #61

I would like to request support for NVIDIA Ampere architecture GPUs in FlashMLA. I understand that many of the current optimizations are specific to Hopper GPUs, but having a lite version compatible with ...
  • ehartford
  • 2
  • Opened 
    on Mar 3
  • #60

pre def flash_mla(): torch.cuda.synchronize() tile_scheduler_metadata, num_splits = get_mla_metadata(cache_seqlens, s_q * h_q // h_kv, h_kv) /pre I added a sync(), and found that the performance was much ...
  • pipul
  • Opened 
    on Mar 3
  • #59

发现没有什么其它专门介绍这个库的paper。 如果有,麻烦大佬给个指引。
  • echoht
  • Opened 
    on Feb 28
  • #53

img width= 609 alt= Image src= https://github.com/user-attachments/assets/b1864bd1-7898-40b6-bd5f-33645cb91b0f / head_size here comes from q.sizes()[3] But in modeling_deepseek.py of DeepSeek-V3 model, ...
  • WangNorthSea
  • 1
  • Opened 
    on Feb 27
  • #49

I m curious about this. It seems we can overlap cuda core and tensor core using warp specialization. But if it s just for overlap g2s and computation, is there any difference between the warp-specialization ...
  • sleepwalker2017
  • 1
  • Opened 
    on Feb 27
  • #48
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub