Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aldebaran BBS tuning #1284

Merged
merged 2 commits into from
Jan 19, 2023
Merged

Conversation

benjaminulmer
Copy link
Contributor

Tuning for SWDEV-372453

All sizes are new and there are some new kernels as well.

Copy link
Contributor

@babakpst babakpst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are using the re-tuning script for this PR. Ideally, we should not see any new kernels in the diff (I need to update merge.py to not adds duplicates). The performances of new sizes are low. Please merge only if this PR solves the issue.

Copy link
Contributor

@TorreZuk TorreZuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same precheckin HMM failures as baseline, LGTM if extended and staging testing pass

@benjaminulmer
Copy link
Contributor Author

I assume you are using the re-tuning script for this PR

It's a combination a retuning and regular tuning. All the kernels added are true new kernels

@babakpst
Copy link
Contributor

babakpst commented Jan 3, 2023

This PR is ready to merge. The failed tests are unrelated to these changes.

NaveenElumalaiAMD pushed a commit to NaveenElumalaiAMD/rocBLAS that referenced this pull request Jan 9, 2023
@nielenventer
Copy link
Contributor

nielenventer commented Jan 11, 2023

This patch causes a regression for large GEMMs (SWDEV-375718), the reason seems to be that there are no tuning points for larger M values (this one stops at 16), and so some large GEMMs are picking the kernels for the M=16 case (v small tile size, perf regression). I'm doing some extra tuning for larger M and will update the PR.

@nielenventer
Copy link
Contributor

I added some new exact sizes, with large M, following the pattern of the other tunings in this commit. I confirmed it fixes the regression.

Only the Retune tool was used, but unfortunately new kernels are added due to known issue with merge script.

@babakpst babakpst merged commit 9d6f87a into ROCm:release/rocm-rel-5.5 Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants