aldebaran BBS tuning #1284

benjaminulmer · 2022-12-19T23:10:47Z

Tuning for SWDEV-372453

All sizes are new and there are some new kernels as well.

babakpst

I assume you are using the re-tuning script for this PR. Ideally, we should not see any new kernels in the diff (I need to update merge.py to not adds duplicates). The performances of new sizes are low. Please merge only if this PR solves the issue.

TorreZuk

same precheckin HMM failures as baseline, LGTM if extended and staging testing pass

benjaminulmer · 2022-12-21T23:30:20Z

I assume you are using the re-tuning script for this PR

It's a combination a retuning and regular tuning. All the kernels added are true new kernels

babakpst · 2023-01-03T21:21:22Z

This PR is ready to merge. The failed tests are unrelated to these changes.

nielenventer · 2023-01-11T23:18:19Z

This patch causes a regression for large GEMMs (SWDEV-375718), the reason seems to be that there are no tuning points for larger M values (this one stops at 16), and so some large GEMMs are picking the kernels for the M=16 case (v small tile size, perf regression). I'm doing some extra tuning for larger M and will update the PR.

… sizes

nielenventer · 2023-01-18T04:04:49Z

I added some new exact sizes, with large M, following the pattern of the other tunings in this commit. I confirmed it fixes the regression.

Only the Retune tool was used, but unfortunately new kernels are added due to known issue with merge script.

aldebaran BBS tuning

a4cd04b

benjaminulmer requested review from amcamd, TorreZuk, mahmoodw, daineAMD, bragadeesh, NaveenElumalaiAMD, rkamd, yoichiyoshida and babakpst as code owners December 19, 2022 23:10

babakpst approved these changes Dec 19, 2022

View reviewed changes

TorreZuk approved these changes Dec 20, 2022

View reviewed changes

NaveenElumalaiAMD pushed a commit to NaveenElumalaiAMD/rocBLAS that referenced this pull request Jan 9, 2023

Only check for PR labels if running PR CI job (ROCm#1284)

f6730d8

Add some newer tunings (higher M) to account for regression on larger…

5c643b3

… sizes

babakpst merged commit 9d6f87a into ROCm:release/rocm-rel-5.5 Jan 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aldebaran BBS tuning #1284

aldebaran BBS tuning #1284

benjaminulmer commented Dec 19, 2022

babakpst left a comment

TorreZuk left a comment

benjaminulmer commented Dec 21, 2022

babakpst commented Jan 3, 2023

nielenventer commented Jan 11, 2023 •

edited

Loading

nielenventer commented Jan 18, 2023

aldebaran BBS tuning #1284

aldebaran BBS tuning #1284

Conversation

benjaminulmer commented Dec 19, 2022

babakpst left a comment

Choose a reason for hiding this comment

TorreZuk left a comment

Choose a reason for hiding this comment

benjaminulmer commented Dec 21, 2022

babakpst commented Jan 3, 2023

nielenventer commented Jan 11, 2023 • edited Loading

nielenventer commented Jan 18, 2023

nielenventer commented Jan 11, 2023 •

edited

Loading