[MLA] fake non persistent feature for mi355#1472
Merged
Conversation
f76b43d to
e4df5b8
Compare
0cb602a to
0a9d261
Compare
0a9d261 to
4775da4
Compare
4dff72f to
4111a21
Compare
f095dc0 to
3d4907b
Compare
3d4907b to
4c21c57
Compare
d99b5ce to
80c19c3
Compare
0ec6249 to
68e1463
Compare
2dd47f7 to
f9a14c4
Compare
HaiShaw
approved these changes
Dec 7, 2025
Collaborator
|
To merge with all CI passed |
zhuyuhua-v
pushed a commit
that referenced
this pull request
Dec 17, 2025
* update mla persistent to non-ps * update mi355 mla kernel * fix bugs * update a8w8 ps kernel mi350 * update qlen=1 * remote debug code * fix mtp * fix ut * update varlen * fix final decoding reduce * update * temps * Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version * Increase the maximum kv split number to 32 * update ut * change grid size into num_cu * update adapt splits * enable more nhead * update split seqlen <= 4 merge * update mla kernel * update RDM for a8w8 kernel * fix ps tail merge * update a8w8 qh16 qseqlen1 ps kernel * update ck * update dp fake-non-ps * update dp fake-non-ps for mtp * upate ut * update kv_len < qo_len for cuda graph capture * update * ready for rw * fix bf16 fp8 multi nhead --------- Co-authored-by: Fang.Che <Fang.Che@amd.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
ZhangLirong-amd
pushed a commit
that referenced
this pull request
Dec 29, 2025
* update mla persistent to non-ps * update mi355 mla kernel * fix bugs * update a8w8 ps kernel mi350 * update qlen=1 * remote debug code * fix mtp * fix ut * update varlen * fix final decoding reduce * update * temps * Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version * Increase the maximum kv split number to 32 * update ut * change grid size into num_cu * update adapt splits * enable more nhead * update split seqlen <= 4 merge * update mla kernel * update RDM for a8w8 kernel * fix ps tail merge * update a8w8 qh16 qseqlen1 ps kernel * update ck * update dp fake-non-ps * update dp fake-non-ps for mtp * upate ut * update kv_len < qo_len for cuda graph capture * update * ready for rw * fix bf16 fp8 multi nhead --------- Co-authored-by: Fang.Che <Fang.Che@amd.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
valarLip
pushed a commit
that referenced
this pull request
Mar 18, 2026
* update mla persistent to non-ps * update mi355 mla kernel * fix bugs * update a8w8 ps kernel mi350 * update qlen=1 * remote debug code * fix mtp * fix ut * update varlen * fix final decoding reduce * update * temps * Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version * Increase the maximum kv split number to 32 * update ut * change grid size into num_cu * update adapt splits * enable more nhead * update split seqlen <= 4 merge * update mla kernel * update RDM for a8w8 kernel * fix ps tail merge * update a8w8 qh16 qseqlen1 ps kernel * update ck * update dp fake-non-ps * update dp fake-non-ps for mtp * upate ut * update kv_len < qo_len for cuda graph capture * update * ready for rw * fix bf16 fp8 multi nhead --------- Co-authored-by: Fang.Che <Fang.Che@amd.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
valarLip
pushed a commit
that referenced
this pull request
Mar 18, 2026
* update mla persistent to non-ps * update mi355 mla kernel * fix bugs * update a8w8 ps kernel mi350 * update qlen=1 * remote debug code * fix mtp * fix ut * update varlen * fix final decoding reduce * update * temps * Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version * Increase the maximum kv split number to 32 * update ut * change grid size into num_cu * update adapt splits * enable more nhead * update split seqlen <= 4 merge * update mla kernel * update RDM for a8w8 kernel * fix ps tail merge * update a8w8 qh16 qseqlen1 ps kernel * update ck * update dp fake-non-ps * update dp fake-non-ps for mtp * upate ut * update kv_len < qo_len for cuda graph capture * update * ready for rw * fix bf16 fp8 multi nhead --------- Co-authored-by: Fang.Che <Fang.Che@amd.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist