add igemm bwd v4r1 xdlops kernel#167
Conversation
|
Tidy error: |
modified by the latest commit |
Thanks. |
|
perf db attached |
|
@daniellowell Hi, Daniel, and for find_db update label, what is supposed to provided? |
|
@shaojiewang Merge conflicts. |
ok. |
|
@asroy Hi, Chao, could you please re-review this PR in pub repo? Thanks. |
|
@asroy Hi, Chao, could you please check the performance data in the following link? |
DeepCode's analysis on #ac672e found:👉 View analysis in DeepCode’s Dashboard |
|
The cases you posted seems enough for me, but if you have already tested other cases, please post all the cases you have collected (all cases you tested, all solvers you tested), not the only the "good" performing cases. Also do some tests with hip-clang and check if there is correctness issue or performance issue, you can use the same test cases you posted |
@asroy Yes, OK, I will posted them. Thanks
@asroy Yes, I will test this PR with hip-clang today. Thanks |
|
Shall we avoid posting confluence links https://github.com/AMDComputeLibraries/MLOpen/issues/2522#issue-592311067 |
Yes. |
|
For future reference, I see 3 files in this review marked to be changed but no changes inside. @shaojiewang if you are touching these files for some reason, could you please not do 'git add' when you haven't done any material change? Otherwise, they show up as changed files with no changes inside. src/include/miopen/execution_context.hpp |
@TejashShah Yes, OK. Thank you. |
@asroy Hi, Chao, I updated the test data in that link. Could you please re-review it. |
|
@shaojiewang PR LGTM I checked you perf number. hcc vs hip-clang performance, I'm seeing signification performance regression for some configuration for v1r1-xdlops. Please open JIRA ticket and copy a link here. |
Yes, OK. JIRA is :SWDEV-236800 |
This PR aims to improve the performance on smaller K when stride is 1
Add an xdlops implicit gemm kernel for backward data:
gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw
Testing
Tests runs on these conditions:
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_FFT=0
export MIOPEN_DEBUG_CONV_DIRECT=0
export MIOPEN_DEBUG_CONV_GEMM=0
export MIOPEN_DEBUG_CONV_SCGEMM=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=1
export MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_FIRST_SOLUTION=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_XDLOPS=1
export MIOPEN_DEBUG_IMPLICIT_GEMM_XDLOPS_INLINE_ASM=1
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_XDLOPS_EMULATE=0
export MIOPEN_FIND_ENFORCE=4
export MIOPEN_LOG_LEVEL=6
export ROCBLAS_LAYER=3
export KMDUMPISA=1
export KMDUMPLLVM=1
./bin/MIOpenDriver conv -F 2 -n 128 -c 256 -H 17 -W 17 -k 128 -y 1 -x 7 -p 0 -q 3 -u 1 -v 1 -l 1 -j 1 -t 1
./bin/MIOpenDriver conv -F 2 -n 128 -c 512 -H 17 -W 17 -k 128 -y 1 -x 7 -p 0 -q 3 -u 1 -v 1 -l 1 -j 1 -t 1