Skip to content

[Navi3x] Add Device Operations#567

Merged
asroy merged 43 commits into
developfrom
navi3x_mD_batchedGEMM_GroupConvFwd
Feb 15, 2023
Merged

[Navi3x] Add Device Operations#567
asroy merged 43 commits into
developfrom
navi3x_mD_batchedGEMM_GroupConvFwd

Conversation

@aska-0096
Copy link
Copy Markdown
Contributor

@aska-0096 aska-0096 commented Jan 30, 2023

Three Device Operations Added

1. DeviceGemmMultipleD_Wmma_Cshuffle

  • example_gemm_bilinear_wmma_fp16

2. DeviceBatchedContractionMultipleD_Wmma_Cshuffle

  • example_batched_gemm_bias_e_permute_wmma_fp16

3. DeviceGroupedConvFwdMultipleD_Wmma_Cshuffle

  • example_grouped_conv_fwd_bias_relu_add_wmma_fp16
  • Performance of Depthwise convolution is very low, need further optimization

Above example passed with latest version of amd-stg-open compiler

@aska-0096 aska-0096 added enhancement New feature or request urgency_high labels Jan 30, 2023
@aska-0096 aska-0096 self-assigned this Jan 30, 2023
@asroy
Copy link
Copy Markdown
Contributor

asroy commented Jan 30, 2023

@aska-0096 Does this PR also pass on current compiler used by CI? If not, we may need to update compiler on CI again

cc @illsilin

@illsilin
Copy link
Copy Markdown
Collaborator

Looks like we got a couple of new test failures in CI for this branch:

[2023-01-30T12:16:01.378Z] The following tests FAILED:
[2023-01-30T12:16:01.378Z] 11 - example_gemm_bilinear_wmma_fp16 (Child aborted)
[2023-01-30T12:16:01.378Z] 80 - example_grouped_conv_fwd_bias_relu_add_wmma_fp16 (Child aborted)

@aska-0096
Copy link
Copy Markdown
Contributor Author

aska-0096 commented Jan 31, 2023

Hi @asroy @illsilin
Confirmed that two examples mentioned above failed on 5.3.1, works on 5.4.1 with rather lower performance than 5.5.0 or latest amd-stg-open.

@aska-0096 aska-0096 requested a review from illsilin January 31, 2023 08:00
@aska-0096
Copy link
Copy Markdown
Contributor Author

Hi @illsilin @asroy
Do we have progress on upgrading CI compiler? As AITemplate side need these operations to enable Resnet50 on Navi3x.

@illsilin
Copy link
Copy Markdown
Collaborator

illsilin commented Feb 7, 2023

I have updated the CI compiler. Please sync your branch with develop branch.

@aska-0096
Copy link
Copy Markdown
Contributor Author

The docker I triggered still the rocm/composable_kernel:ck_ub20.04_rocm5.3_release, no diff with older one. I think the compiler has not been upgraded yet. @illsilin

@illsilin
Copy link
Copy Markdown
Collaborator

illsilin commented Feb 8, 2023

What happened is, I changed the default values for compiler in jenkins parameters. Usually it takes 20-30 minutes for Jenkins to update those after the change has been merged. So if CI is launched before that, it will still use old default values. I'll restart your branch manually now and it will use the new compiler defaults.

@illsilin
Copy link
Copy Markdown
Collaborator

illsilin commented Feb 8, 2023

OK, so the results are in: there are 3 failures:

[2023-02-08T17:13:42.060Z] The following tests FAILED:
[2023-02-08T17:13:42.060Z] 11 - example_gemm_bilinear_wmma_fp16 (Child aborted)
[2023-02-08T17:13:42.060Z] 80 - example_grouped_conv_fwd_bias_relu_add_wmma_fp16 (Child aborted)
[2023-02-08T17:13:42.060Z] 150 - test_grouped_convnd_bwd_weight (Failed)

Test 150 seems sensitive, I re-ran it locally with your branch and it passed. in the CI test results were just different from baseline by 1, 879 vs 880. So most likely a round-off error.

The other two tests, however, should not have been launched on MI100/200. So you need to add a check somewhere to make sure those tests are only triggered "#if defined(gfx1100)".

@illsilin
Copy link
Copy Markdown
Collaborator

illsilin commented Feb 8, 2023

One option is to make sure those tests are only built if the appropriate GPU architecture is on the list of targets:

diff --git a/example/02_gemm_bilinear/CMakeLists.txt b/example/02_gemm_bilinear/CMakeLists.txt
index 425029c0..6266af0a 100644
--- a/example/02_gemm_bilinear/CMakeLists.txt
+++ b/example/02_gemm_bilinear/CMakeLists.txt
@@ -1,2 +1,4 @@
add_example_executable(example_gemm_bilinear_xdl_fp16 gemm_bilinear_xdl_fp16.cpp)
-add_example_executable(example_gemm_bilinear_wmma_fp16 gemm_bilinear_wmma_fp16.cpp)
+if(GPU_TARGETS MATCHES gfx1100)

  • add_example_executable(example_gemm_bilinear_wmma_fp16 gemm_bilinear_wmma_fp16.cpp)
    +endif()

@aska-0096
Copy link
Copy Markdown
Contributor Author

Interesting, I confirmed the CI failed due to example running on the unsupported GPU. However, 2 of 4 WMMA including examples passed without compile/runtime error.
Let me try your suggestion to add arch-limitation first like what I do in the test folder.

@aska-0096
Copy link
Copy Markdown
Contributor Author

Hi @illsilin, CI passed.
cc: @asroy

@aska-0096
Copy link
Copy Markdown
Contributor Author

@asroy
Just a reminder about this PR, I believe the device ops added in the PR would be a valuable addition to the AIT side guys.
I would be grateful if you could take a look at it and consider merging it.

@asroy asroy merged commit 0cfda84 into develop Feb 15, 2023
@illsilin illsilin deleted the navi3x_mD_batchedGEMM_GroupConvFwd branch December 14, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants