Skip to content

Simplify kernel argument of device operator Device(Batched)GemmXdl<>#723

Merged
zjing14 merged 86 commits into
developfrom
feature/simplify-karg-for-device-gemm-xdl-improved
Jun 1, 2023
Merged

Simplify kernel argument of device operator Device(Batched)GemmXdl<>#723
zjing14 merged 86 commits into
developfrom
feature/simplify-karg-for-device-gemm-xdl-improved

Conversation

@poyenc
Copy link
Copy Markdown
Contributor

@poyenc poyenc commented May 24, 2023

I had tried to reduce kernel arguments for following device operators

  • DeviceGemmXdl<> (create descriptors on device side)
  • DeviceBatchedGemmXdl<> (create descriptors on device side)
  • DeviceConv2dFwdXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K<>
  • DeviceConv2dBwdDataXdl_Input_N_Hi_Wi_C_Weight_K_Y_X_C_Output_N_Ho_Wo_K<>
  • DeviceConvNdBwdDataNwcKxcNwk_Xdl<>

In general, the stateless elementwise operation object & Block2CTileMap are totally redundant parameters for kernels. Thus we can remove them safely in most of the device operators. I only tried to update GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v2r3<> related device operators for now.

To achieve better code reusability, I had decoupled the dependency between BlockToCTileMap_M00_N0_M01Adapt<> and its third type argument: CGridDesc_M_N. Which means we no-longer need to pass CGridDesc_M_N in order to create tile mapping objects.

I also splitted the GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v2r3<> template into 2 templates (also corresponding entry kernels). They are dedicated for different type of device operators.

  • GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v2r3<>: For Convolution device operators. No default descriptor creators. Users have to create descriptors by their own.
  • GridwiseGemm_k0mk1_k0nk1_mn_xdlops_v2r3_ext<>: For GEMM device operators. Has default descriptor creators for A/B/C matrics. Users have to provide extra A/B/CLayout and GemmSpecialization template arguments.

I found that we may make code more unmanageable if moving descriptor creation logics into device functions. I had done this for DeviceConvNdBwdDataNwcKxcNwk_Xdl<> on branch feature/simplify-karg-for-device-gemm-xdl. But the code is hard to read (the Argument class becomes tediously long) and exists unknown correctness issue since revision: d4efc6a (I think it will take more time to troubleshoot).

This patch requires changes from following pull requests:

poyenc added 30 commits May 4, 2023 13:29
@poyenc
Copy link
Copy Markdown
Contributor Author

poyenc commented May 24, 2023

Looks like we have correctness issue with DeviceGemm_Xdl_CShuffle<> since revision: 5710567 (should pass the tests in #696). But I didn't touch the DeviceGemm_Xdl_CShuffle<> in this PR.
Shall update the underlying branch of #696 and run CI again.

@poyenc
Copy link
Copy Markdown
Contributor Author

poyenc commented May 29, 2023

I Just fixed the CI server test failure issue in #696

@zjing14 zjing14 merged commit 9eae73d into develop Jun 1, 2023
@illsilin illsilin deleted the feature/simplify-karg-for-device-gemm-xdl-improved branch December 7, 2023 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants