issues Search Results · repo:deepseek-ai/FlashMLA language:C++
Filter by
42 results
(78 ms)42 results
indeepseek-ai/FlashMLA (press backspace or delete to remove)https://github.com/deepseek-ai/FlashMLA/blob/b31bfe72a83ea205467b3271a5845440a03ed7cb/csrc/flash_api.cpp#L184
Hi all,
Just wondering why the shape of O_accum and LSE_accum change from [numsplit, batch, ...
mingyangHao
- Opened on Apr 7
- #70
@sijiac Why add permanent false condition for determine?
https://github.com/deepseek-ai/FlashMLA/blob/b31bfe72a83ea205467b3271a5845440a03ed7cb/csrc/flash_fwd_mla_kernel.h#L528
LearnerInGithub
- 1
- Opened on Mar 6
- #64
@sijiac Hello everyone! I want to raise a question about the usage of CUDA qualifier launch_bounds. In CUDA document,
launch_bounds( ) only has 2 parameters: maxThreadsPerBlock and minBlocksPerMultiprocessor. ...
LearnerInGithub
- 2
- Opened on Mar 6
- #63
I read the code of flash MLA, and I have some questions:
1. why not use tma to load QK, but use the SM80 copy_async.
2. To store data from register to global memory, it uses shared memory to change ...
Idonthaveaname-wq
- 1
- Opened on Mar 5
- #61
I would like to request support for NVIDIA Ampere architecture GPUs in FlashMLA. I understand that many of the current
optimizations are specific to Hopper GPUs, but having a lite version compatible with ...
ehartford
- 2
- Opened on Mar 3
- #60
pre def flash_mla(): torch.cuda.synchronize() tile_scheduler_metadata, num_splits = get_mla_metadata(cache_seqlens, s_q
* h_q // h_kv, h_kv) /pre
I added a sync(), and found that the performance was much ...
pipul
- Opened on Mar 3
- #59
img width= 609 alt= Image src= https://github.com/user-attachments/assets/b1864bd1-7898-40b6-bd5f-33645cb91b0f /
head_size here comes from q.sizes()[3] But in modeling_deepseek.py of DeepSeek-V3 model, ...
WangNorthSea
- 1
- Opened on Feb 27
- #49
I m curious about this.
It seems we can overlap cuda core and tensor core using warp specialization.
But if it s just for overlap g2s and computation, is there any difference between the warp-specialization ...
sleepwalker2017
- 1
- Opened on Feb 27
- #48

Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Press the /
key to activate the search input again and adjust your query.
Learn how you can use GitHub Issues to plan and track your work.
Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub IssuesProTip!
Restrict your search to the title by using the in:title qualifier.