Matrix B skips LDS by ltqin · Pull Request #264 · ROCm/composable_kernel

ltqin · 2022-05-30T08:32:37Z

No description provided.

rosenrodt · 2022-05-31T09:13:28Z

+
+    // return block_id to C matrix tile idx (m0, n0) mapping
+    __host__ __device__ static constexpr auto
+    MakeDefaultBlock2CTileMap(const CGridDesc_M_N& c_grid_desc_m_n, index_t M01, index_t N01)


This function has been moved to its own classes. Please see PR #235 for reference.

rosenrodt · 2022-05-31T09:20:02Z

+                do
+                {
+                    a_blockwise_copy.RunRead(a_grid_desc_k0_m_k1, a_grid_buf);
+                    b_threadwise_copy.Run(b_grid_desc_k0_k1_k2_n0_n1_n2_n3_k3,


I have a feeling that maybe we can configure B BlockwiseCopy in a way that it simply skips loading to LDS. This way maybe we can just reuse the same GridwiseGemmPipeline_v1 and BlockwiseGemm. Is it doable?

rosenrodt · 2022-05-31T09:27:57Z

+    __host__ __device__ static constexpr index_t
+    CalculateGridSize(const CGridDesc_M_N& c_grid_desc_m_n)
+    {
+        const auto M = c_grid_desc_m_n.GetLength(I0);


Functionality moved to classes. See PR #235

ltqin added 30 commits May 7, 2022 22:17

start

71d974b

read for gridwise gemm

b5b8562

add MakeBGridDescriptor_K0_N0_N1_N2_N3_K1

c08dcaa

add thread copy desc and register buffer

7d42a6d

add K0PerBlock dim

673b30c

add read global data

d9240c6

finish gridwise gemm

0adfd1a

finish blockwise gemm

8683437

add print data

21885e9

add smallest config

64c5889

add compare code for gridwis gemm

4e81617

fix NXdlPerWave

2159921

fix k0perthread and gridewis gemm main loop

071ca12

remove b matrix lds alloc

6b4c298

fix name

4f88629

add test code

8d4b51c

create b_grid_desc_k0_k1_k2_n0_n1_n2_n3_k3 from parameter

cf360b7

add double register

53963d6

modify b_thread_desc_

a88005b

add float

215b177

fp16 tag

c5c32b4

add tail for pipeline

1d478b9

finish main loop

5173bdd

optimize main loop

09b9767

start clear gridwise gemm

fb6dafe

clear code

6de61c0

clear redundant code

28890a1

change file name

8dd8936

change file name

7d85d04

Merge branch 'develop' into bmatrix_skip_lds

f9c478e

ltqin requested a review from zjing14 May 30, 2022 11:28

ltqin added 2 commits May 30, 2022 19:39

fix bug after merge develop

b571256

Merge branch 'develop' into bmatrix_skip_lds

c3de33f

rosenrodt reviewed May 31, 2022

View reviewed changes

ltqin added 4 commits June 1, 2022 10:41

fix input parameters

993ec45

using MultiK0 control b load data loop

428ae72

fix some config

179c561

fix verify data(mulitk0)

346e837

ltqin closed this Jun 20, 2022

illsilin deleted the bmatrix_skip_lds branch December 7, 2023 18:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matrix B skips LDS#264

Matrix B skips LDS#264
ltqin wants to merge 36 commits into
developfrom
bmatrix_skip_lds

ltqin commented May 30, 2022

Uh oh!

rosenrodt May 31, 2022

Uh oh!

rosenrodt May 31, 2022

Uh oh!

rosenrodt May 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ltqin commented May 30, 2022

Uh oh!

rosenrodt May 31, 2022

Choose a reason for hiding this comment

Uh oh!

rosenrodt May 31, 2022

Choose a reason for hiding this comment

Uh oh!

rosenrodt May 31, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants