Skip to content

[Conv] Enable bwd weight splitk autodeduction with cap#3656

Merged
johannes-graner merged 8 commits intodevelopfrom
jograner/bwd-weight-splitk-autodeduce
Jan 29, 2026
Merged

[Conv] Enable bwd weight splitk autodeduction with cap#3656
johannes-graner merged 8 commits intodevelopfrom
jograner/bwd-weight-splitk-autodeduce

Conversation

@johannes-graner
Copy link
Copy Markdown
Contributor

Proposed changes

Enables split-k autodeduction for old CK bwd weight convolutions. Since the autodeduction can produce very large split-k values that lead to inaccuracy, the deduced value is capped at 128. Under the assumption that performance is a convex function of split-k and the auto-deduced value is correct, capping at 128 results in the best reasonable value if the reasonable values are {1, ..., 128}.

Enabling autodeduction led to tests using larger split-k values, which exposed a bug in the profiler error threshold calculations. This was also addressed since it made the tests fail.

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

bartekxk
bartekxk previously approved these changes Jan 27, 2026
@johannes-graner johannes-graner force-pushed the jograner/bwd-weight-splitk-autodeduce branch from 4feac1e to 55d8e9b Compare January 28, 2026 07:12
@johannes-graner johannes-graner enabled auto-merge (squash) January 29, 2026 06:44
vpietila-amd
vpietila-amd previously approved these changes Jan 29, 2026
Copy link
Copy Markdown
Contributor

@vpietila-amd vpietila-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. One minor comment regarding the WMMA multi-D kernel, but we can fix this separately.


static bool IsSupportedArgument(const Argument& arg)
{
#if DISABLE_SPLIT_K_AUTODEDUCE_FOR_ONE_STAGE_KERNELS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you modify this such that we return false if BlkGemmPipelineVer == BlockGemmPipelineVersion::v4 since the autodeducted split-K value is probably too low in this case.

bartekxk
bartekxk previously approved these changes Jan 29, 2026
@johannes-graner johannes-graner dismissed stale reviews from bartekxk and vpietila-amd via 66d2769 January 29, 2026 09:41
@johannes-graner johannes-graner merged commit fabac7e into develop Jan 29, 2026
20 checks passed
@johannes-graner johannes-graner deleted the jograner/bwd-weight-splitk-autodeduce branch January 29, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants