Skip to content

[MLA] fake non persistent feature for mi355#1472

Merged
HaiShaw merged 34 commits intomainfrom
mla_fake_non_persistent
Dec 7, 2025
Merged

[MLA] fake non persistent feature for mi355#1472
HaiShaw merged 34 commits intomainfrom
mla_fake_non_persistent

Conversation

@Zzz9990
Copy link
Copy Markdown
Contributor

@Zzz9990 Zzz9990 commented Nov 23, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from f76b43d to e4df5b8 Compare November 27, 2025 02:18
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch 3 times, most recently from 0cb602a to 0a9d261 Compare November 27, 2025 10:00
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from 0a9d261 to 4775da4 Compare November 27, 2025 10:06
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from 4dff72f to 4111a21 Compare December 1, 2025 10:22
@shengnxu shengnxu requested a review from valarLip December 4, 2025 01:39
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from f095dc0 to 3d4907b Compare December 5, 2025 12:11
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from 3d4907b to 4c21c57 Compare December 5, 2025 12:13
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from d99b5ce to 80c19c3 Compare December 5, 2025 12:52
@Zzz9990 Zzz9990 changed the title Mla fake non persistent [MLA] fake non persistent feature for mi355 Dec 6, 2025
valarLip
valarLip previously approved these changes Dec 6, 2025
Copy link
Copy Markdown
Collaborator

@valarLip valarLip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch 6 times, most recently from 0ec6249 to 68e1463 Compare December 6, 2025 15:39
@Zzz9990 Zzz9990 marked this pull request as ready for review December 7, 2025 02:24
@Zzz9990 Zzz9990 force-pushed the mla_fake_non_persistent branch from 2dd47f7 to f9a14c4 Compare December 7, 2025 03:01
@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Dec 7, 2025

To merge with all CI passed

@HaiShaw HaiShaw merged commit bb43ec0 into main Dec 7, 2025
22 checks passed
@HaiShaw HaiShaw deleted the mla_fake_non_persistent branch December 7, 2025 19:48
zhuyuhua-v pushed a commit that referenced this pull request Dec 17, 2025
* update mla persistent to non-ps

* update mi355 mla kernel

* fix bugs

* update a8w8 ps kernel mi350

* update qlen=1

* remote debug code

* fix mtp

* fix ut

* update varlen

* fix final decoding reduce

* update

* temps

* Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version

* Increase the maximum kv split number to 32

* update ut

* change grid size into num_cu

* update adapt splits

* enable more nhead

* update split seqlen <= 4 merge

* update mla kernel

* update RDM for a8w8 kernel

* fix ps tail merge

* update a8w8 qh16 qseqlen1 ps kernel

* update ck

* update dp fake-non-ps

* update dp fake-non-ps for mtp

* upate ut

* update  kv_len < qo_len for cuda graph capture

* update

* ready for rw

* fix bf16 fp8 multi nhead

---------

Co-authored-by: Fang.Che <Fang.Che@amd.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
ZhangLirong-amd pushed a commit that referenced this pull request Dec 29, 2025
* update mla persistent to non-ps

* update mi355 mla kernel

* fix bugs

* update a8w8 ps kernel mi350

* update qlen=1

* remote debug code

* fix mtp

* fix ut

* update varlen

* fix final decoding reduce

* update

* temps

* Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version

* Increase the maximum kv split number to 32

* update ut

* change grid size into num_cu

* update adapt splits

* enable more nhead

* update split seqlen <= 4 merge

* update mla kernel

* update RDM for a8w8 kernel

* fix ps tail merge

* update a8w8 qh16 qseqlen1 ps kernel

* update ck

* update dp fake-non-ps

* update dp fake-non-ps for mtp

* upate ut

* update  kv_len < qo_len for cuda graph capture

* update

* ready for rw

* fix bf16 fp8 multi nhead

---------

Co-authored-by: Fang.Che <Fang.Che@amd.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
valarLip pushed a commit that referenced this pull request Mar 18, 2026
* update mla persistent to non-ps

* update mi355 mla kernel

* fix bugs

* update a8w8 ps kernel mi350

* update qlen=1

* remote debug code

* fix mtp

* fix ut

* update varlen

* fix final decoding reduce

* update

* temps

* Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version

* Increase the maximum kv split number to 32

* update ut

* change grid size into num_cu

* update adapt splits

* enable more nhead

* update split seqlen <= 4 merge

* update mla kernel

* update RDM for a8w8 kernel

* fix ps tail merge

* update a8w8 qh16 qseqlen1 ps kernel

* update ck

* update dp fake-non-ps

* update dp fake-non-ps for mtp

* upate ut

* update  kv_len < qo_len for cuda graph capture

* update

* ready for rw

* fix bf16 fp8 multi nhead

---------

Co-authored-by: Fang.Che <Fang.Che@amd.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
valarLip pushed a commit that referenced this pull request Mar 18, 2026
* update mla persistent to non-ps

* update mi355 mla kernel

* fix bugs

* update a8w8 ps kernel mi350

* update qlen=1

* remote debug code

* fix mtp

* fix ut

* update varlen

* fix final decoding reduce

* update

* temps

* Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version

* Increase the maximum kv split number to 32

* update ut

* change grid size into num_cu

* update adapt splits

* enable more nhead

* update split seqlen <= 4 merge

* update mla kernel

* update RDM for a8w8 kernel

* fix ps tail merge

* update a8w8 qh16 qseqlen1 ps kernel

* update ck

* update dp fake-non-ps

* update dp fake-non-ps for mtp

* upate ut

* update  kv_len < qo_len for cuda graph capture

* update

* ready for rw

* fix bf16 fp8 multi nhead

---------

Co-authored-by: Fang.Che <Fang.Che@amd.com>
Co-authored-by: Your Name <you@example.com>
Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants