[MLA] fake non persistent feature for mi355 by Zzz9990 · Pull Request #1472 · ROCm/aiter

Zzz9990 · 2025-11-23T11:36:54Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

valarLip

lgtm

HaiShaw · 2025-12-07T19:48:27Z

To merge with all CI passed

* update mla persistent to non-ps * update mi355 mla kernel * fix bugs * update a8w8 ps kernel mi350 * update qlen=1 * remote debug code * fix mtp * fix ut * update varlen * fix final decoding reduce * update * temps * Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version * Increase the maximum kv split number to 32 * update ut * change grid size into num_cu * update adapt splits * enable more nhead * update split seqlen <= 4 merge * update mla kernel * update RDM for a8w8 kernel * fix ps tail merge * update a8w8 qh16 qseqlen1 ps kernel * update ck * update dp fake-non-ps * update dp fake-non-ps for mtp * upate ut * update kv_len < qo_len for cuda graph capture * update * ready for rw * fix bf16 fp8 multi nhead --------- Co-authored-by: Fang.Che <Fang.Che@amd.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: root <root@mia1-p01-g07.mia.tensorwave.lan>

Zzz9990 and others added 16 commits November 18, 2025 14:58

update mla persistent to non-ps

4a0ff80

update mi355 mla kernel

7f6e54e

fix bugs

07ce5b1

update a8w8 ps kernel mi350

4ee6fef

update qlen=1

fc17943

remote debug code

ca97b3f

fix mtp

c10b23e

fix ut

bf0df13

update varlen

70bcbd9

fix final decoding reduce

bbd167e

update

d83034c

temps

bf2641e

Revert the mla_a8w8_qh16_qseqlen4_gqaratio16_ps.co to the #1233 version

d0202a8

Increase the maximum kv split number to 32

b38227c

update ut

05f63d9

change grid size into num_cu

e4df5b8

Zzz9990 force-pushed the mla_fake_non_persistent branch from f76b43d to e4df5b8 Compare November 27, 2025 02:18

update adapt splits

67cd51b

Zzz9990 force-pushed the mla_fake_non_persistent branch 3 times, most recently from 0cb602a to 0a9d261 Compare November 27, 2025 10:00

enable more nhead

4775da4

Zzz9990 force-pushed the mla_fake_non_persistent branch from 0a9d261 to 4775da4 Compare November 27, 2025 10:06

update split seqlen <= 4 merge

4111a21

Zzz9990 force-pushed the mla_fake_non_persistent branch from 4dff72f to 4111a21 Compare December 1, 2025 10:22

root and others added 4 commits December 1, 2025 13:31

update mla kernel

54752da

update RDM for a8w8 kernel

2b2afe9

fix ps tail merge

ca37ff6

update a8w8 qh16 qseqlen1 ps kernel

7e9f7df

shengnxu requested a review from valarLip December 4, 2025 01:39

Zzz9990 added 2 commits December 5, 2025 17:49

update dp fake-non-ps

e7a3921

Merge branch 'main' into mla_fake_non_persistent

60394c6

Zzz9990 force-pushed the mla_fake_non_persistent branch from f095dc0 to 3d4907b Compare December 5, 2025 12:11

update dp fake-non-ps for mtp

4c21c57

Zzz9990 force-pushed the mla_fake_non_persistent branch from 3d4907b to 4c21c57 Compare December 5, 2025 12:13

upate ut

80c19c3

Zzz9990 force-pushed the mla_fake_non_persistent branch from d99b5ce to 80c19c3 Compare December 5, 2025 12:52

Zzz9990 added 2 commits December 6, 2025 14:49

update kv_len < qo_len for cuda graph capture

0521935

update

ba5a2a9

Zzz9990 changed the title ~~Mla fake non persistent~~ [MLA] fake non persistent feature for mi355 Dec 6, 2025

ready for rw

09279a3

valarLip previously approved these changes Dec 6, 2025

View reviewed changes

Zzz9990 dismissed valarLip’s stale review via 2b3f97b December 6, 2025 15:05

Zzz9990 force-pushed the mla_fake_non_persistent branch 6 times, most recently from 0ec6249 to 68e1463 Compare December 6, 2025 15:39

Zzz9990 marked this pull request as ready for review December 7, 2025 02:24

fix bf16 fp8 multi nhead

f9a14c4

Zzz9990 force-pushed the mla_fake_non_persistent branch from 2dd47f7 to f9a14c4 Compare December 7, 2025 03:01

HaiShaw approved these changes Dec 7, 2025

View reviewed changes

HaiShaw merged commit bb43ec0 into main Dec 7, 2025
22 checks passed

HaiShaw deleted the mla_fake_non_persistent branch December 7, 2025 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLA] fake non persistent feature for mi355#1472

[MLA] fake non persistent feature for mi355#1472
HaiShaw merged 34 commits intomainfrom
mla_fake_non_persistent

Zzz9990 commented Nov 23, 2025

Uh oh!

valarLip left a comment

Uh oh!

HaiShaw commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Zzz9990 commented Nov 23, 2025

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

valarLip left a comment

Choose a reason for hiding this comment

Uh oh!

HaiShaw commented Dec 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants