Skip to content

add igemm bwd v4r1 xdlops kernel#167

Merged
daniellowell merged 26 commits intodevelopfrom
igemm_bwd_xdlops_v4r1
May 20, 2020
Merged

add igemm bwd v4r1 xdlops kernel#167
daniellowell merged 26 commits intodevelopfrom
igemm_bwd_xdlops_v4r1

Conversation

@shaojiewang
Copy link
Copy Markdown
Contributor

@shaojiewang shaojiewang commented Apr 20, 2020

This PR aims to improve the performance on smaller K when stride is 1

Add an xdlops implicit gemm kernel for backward data:
gridwise_convolution_backward_data_implicit_gemm_v4r1_xdlops_nchw_kcyx_nkhw

Testing
Tests runs on these conditions:
export MIOPEN_DEBUG_CONV_WINOGRAD=0
export MIOPEN_DEBUG_CONV_FFT=0
export MIOPEN_DEBUG_CONV_DIRECT=0
export MIOPEN_DEBUG_CONV_GEMM=0
export MIOPEN_DEBUG_CONV_SCGEMM=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM=1
export MIOPEN_DEBUG_IMPLICIT_GEMM_FIND_FIRST_SOLUTION=0
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_XDLOPS=1
export MIOPEN_DEBUG_IMPLICIT_GEMM_XDLOPS_INLINE_ASM=1
export MIOPEN_DEBUG_CONV_IMPLICIT_GEMM_XDLOPS_EMULATE=0
export MIOPEN_FIND_ENFORCE=4
export MIOPEN_LOG_LEVEL=6
export ROCBLAS_LAYER=3
export KMDUMPISA=1
export KMDUMPLLVM=1

./bin/MIOpenDriver conv -F 2 -n 128 -c 256 -H 17 -W 17 -k 128 -y 1 -x 7 -p 0 -q 3 -u 1 -v 1 -l 1 -j 1 -t 1
./bin/MIOpenDriver conv -F 2 -n 128 -c 512 -H 17 -W 17 -k 128 -y 1 -x 7 -p 0 -q 3 -u 1 -v 1 -l 1 -j 1 -t 1

@shaojiewang shaojiewang requested review from asroy and zjing14 April 20, 2020 02:28
@shaojiewang shaojiewang deleted the igemm_bwd_xdlops_v4r1 branch April 20, 2020 02:49
@shaojiewang shaojiewang restored the igemm_bwd_xdlops_v4r1 branch April 21, 2020 09:19
@shaojiewang shaojiewang reopened this Apr 21, 2020
@daniellowell
Copy link
Copy Markdown

Tidy error:

/var/jenkins/workspace/ibs_MIOpen_igemm_bwd_xdlops_v4r1/src/solver/conv_hip_implicit_gemm_bwd_data_v4r1_xdlops.cpp:218:50: error: implicit conversion bool -> 'unsigned int' [readability-implicit-bool-conversion,-warnings-as-errors]

                                                 true);

                                                 ^~~~

                                                 1u

zjing14
zjing14 previously approved these changes Apr 21, 2020
Copy link
Copy Markdown
Contributor

@zjing14 zjing14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@shaojiewang
Copy link
Copy Markdown
Contributor Author

Tidy error:

/var/jenkins/workspace/ibs_MIOpen_igemm_bwd_xdlops_v4r1/src/solver/conv_hip_implicit_gemm_bwd_data_v4r1_xdlops.cpp:218:50: error: implicit conversion bool -> 'unsigned int' [readability-implicit-bool-conversion,-warnings-as-errors]

                                                 true);

                                                 ^~~~

                                                 1u

modified by the latest commit

@shaojiewang
Copy link
Copy Markdown
Contributor Author

Looks good

Thanks.

@shaojiewang
Copy link
Copy Markdown
Contributor Author

gfx90878.HIP.2_4_0.ufdb.txt

perf db attached

@shaojiewang
Copy link
Copy Markdown
Contributor Author

@daniellowell Hi, Daniel, and for find_db update label, what is supposed to provided?

@daniellowell
Copy link
Copy Markdown

@shaojiewang Merge conflicts.

@shaojiewang
Copy link
Copy Markdown
Contributor Author

@shaojiewang Merge conflicts.

ok.

@shaojiewang
Copy link
Copy Markdown
Contributor Author

@asroy Hi, Chao, could you please re-review this PR in pub repo? Thanks.

@shaojiewang shaojiewang requested a review from zjing14 April 29, 2020 08:23
@shaojiewang
Copy link
Copy Markdown
Contributor Author

shaojiewang commented May 7, 2020

@asroy Hi, Chao, could you please check the performance data in the following link?

igemm_bwd_v4r1_xdlops_performance_data

@ghost
Copy link
Copy Markdown

ghost commented May 8, 2020

DeepCode's analysis on #ac672e found:

⚠️ 1 warning

👉 View analysis in DeepCode’s Dashboard

☺️ If you want to provide feedback on our bot, here is how to contact us.

@shaojiewang shaojiewang requested a review from TejashShah May 8, 2020 08:35
Comment thread src/include/miopen/solver.hpp Outdated
Comment thread src/include/miopen/solver.hpp Outdated
Comment thread src/solver/conv_hip_implicit_gemm_bwd_data_v4r1_xdlops.cpp Outdated
Comment thread src/solver/conv_hip_implicit_gemm_bwd_data_v4r1_xdlops.cpp
@asroy
Copy link
Copy Markdown
Contributor

asroy commented May 9, 2020

@TejashShah

@asroy Hi, Chao, could you please check the performance data in the following link?
https://amdcloud-my.sharepoint.com/:x:/g/personal/shaowang_amd_com/ET95GfZmRc1LgnkbsphBlykBUvMSd_PKcJBsFVm72fx4TQ?e=zdPV3w

@asroy
Copy link
Copy Markdown
Contributor

asroy commented May 9, 2020

OK, I will do it. Can I just post the configs which are improved in iGEMM kernels?

The cases you posted seems enough for me, but if you have already tested other cases, please post all the cases you have collected (all cases you tested, all solvers you tested), not the only the "good" performing cases.

Also do some tests with hip-clang and check if there is correctness issue or performance issue, you can use the same test cases you posted

Comment thread src/include/miopen/solver.hpp
@shaojiewang
Copy link
Copy Markdown
Contributor Author

OK, I will do it. Can I just post the configs which are improved in iGEMM kernels?

The cases you posted seems enough for me, but if you have already tested other cases, please post all the cases you have collected (all cases you tested, all solvers you tested), not the only the "good" performing cases.

@asroy Yes, OK, I will posted them. Thanks

Also do some tests with hip-clang and check if there is correctness issue or performance issue, you can use the same test cases you posted

@asroy Yes, I will test this PR with hip-clang today. Thanks

@atamazov
Copy link
Copy Markdown
Contributor

atamazov commented May 9, 2020

Shall we avoid posting confluence links https://github.com/AMDComputeLibraries/MLOpen/issues/2522#issue-592311067

@shaojiewang
Copy link
Copy Markdown
Contributor Author

Shall we avoid posting confluence links AMDComputeLibraries/MLOpen#2522 (comment)

Yes.

@TejashShah
Copy link
Copy Markdown
Contributor

For future reference,

I see 3 files in this review marked to be changed but no changes inside. @shaojiewang if you are touching these files for some reason, could you please not do 'git add' when you haven't done any material change? Otherwise, they show up as changed files with no changes inside.

src/include/miopen/execution_context.hpp
src/include/miopen/sqlite_db.hpp
src/ocl/convolutionocl.cpp

TejashShah
TejashShah previously approved these changes May 12, 2020
@shaojiewang
Copy link
Copy Markdown
Contributor Author

For future reference,

I see 3 files in this review marked to be changed but no changes inside. @shaojiewang if you are touching these files for some reason, could you please not do 'git add' when you haven't done any material change? Otherwise, they show up as changed files with no changes inside.

src/include/miopen/execution_context.hpp
src/include/miopen/sqlite_db.hpp
src/ocl/convolutionocl.cpp

@TejashShah Yes, OK. Thank you.

@shaojiewang
Copy link
Copy Markdown
Contributor Author

OK, I will do it. Can I just post the configs which are improved in iGEMM kernels?

The cases you posted seems enough for me, but if you have already tested other cases, please post all the cases you have collected (all cases you tested, all solvers you tested), not the only the "good" performing cases.

Also do some tests with hip-clang and check if there is correctness issue or performance issue, you can use the same test cases you posted

@asroy Hi, Chao, I updated the test data in that link. Could you please re-review it.

@shaojiewang shaojiewang requested a review from daniellowell May 15, 2020 03:14
@asroy
Copy link
Copy Markdown
Contributor

asroy commented May 15, 2020

@shaojiewang PR LGTM

I checked you perf number. hcc vs hip-clang performance, I'm seeing signification performance regression for some configuration for v1r1-xdlops.

Please open JIRA ticket and copy a link here.

@shaojiewang
Copy link
Copy Markdown
Contributor Author

@shaojiewang PR LGTM

I checked you perf number. hcc vs hip-clang performance, I'm seeing signification performance regression for some configuration for v1r1-xdlops.

Please open JIRA ticket and copy a link here.

Yes, OK. JIRA is :SWDEV-236800

@daniellowell daniellowell merged commit 6dbc3d9 into develop May 20, 2020
@shaojiewang shaojiewang deleted the igemm_bwd_xdlops_v4r1 branch May 21, 2020 01:44
@atamazov atamazov mentioned this pull request May 25, 2020
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants