Use arch when writing to tuningDB, perfRunnerlooks for arch#2259
Merged
umangyadav merged 6 commits intodevelopfrom Mar 3, 2026
Merged
Use arch when writing to tuningDB, perfRunnerlooks for arch#2259umangyadav merged 6 commits intodevelopfrom
umangyadav merged 6 commits intodevelopfrom
Conversation
dhernandez0
approved these changes
Feb 27, 2026
justinrosner
approved these changes
Feb 27, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug where tuningRunner was writing the chip name (e.g., "gfx950") to the tuning database instead of the full architecture string (e.g., "gfx950:sramecc+:xnack-"). This caused perfRunner to fail to find matching entries when looking up tuning configurations, resulting in NaN TFlops values.
Changes:
- Changed tuningRunner to write
archinstead ofchipto the tuning database, aligning with the database schema and perfRunner's lookup expectations
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
dorde-antic
approved these changes
Feb 27, 2026
Mr-Anyone
approved these changes
Feb 27, 2026
mirza-halilcevic
approved these changes
Feb 27, 2026
Co-authored-by: Mirza Halilčević <109971222+mirza-halilcevic@users.noreply.github.com>
Co-authored-by: Mirza Halilčević <109971222+mirza-halilcevic@users.noreply.github.com>
cursor Bot
pushed a commit
that referenced
this pull request
Mar 13, 2026
Automated weekly review of merged PRs #2234 #2240 #2248 #2249 #2251 #2254 #2257 #2258 #2259 #2270 #2271. Identifies 6 areas with weak test coverage and meaningful business risk: 1. ConcurrentQueue (no unit tests, multi-threaded, silent deadlock risk) 2. parse_tuning_db_line / read_tuning_db key schema change (no Python tests) 3. BooleanElementwiseConverter missing f16/unsigned dtype coverage 4. Attention MaxNumFOp vs MaximumFOp NaN correctness (no dedicated test) 5. firstCausalMaskIter off-by-one risk (no non-trivial shape test) 6. Sliding window attention edge cases (windowSize=0/>=seqLen/unaligned) The GitHub discussion API returned FORBIDDEN (read-only CI token); analysis committed here as a permanent record. Co-authored-by: Djordje Antic <djordje.antic@amd.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
TuningDB has column "arch". But currently it stores "chip".
rocMLIR/mlir/utils/performance/tuningRunner.py
Line 935 in 88adf76
Arch has full name e.g. gfx950:sramecc+:xnack-
Chip only has gfx950.
PerfRunner looks for "arch"
rocMLIR/mlir/utils/performance/perfRunner.py
Line 1812 in 88adf76
When it can not find it, it sets TFLops to NaN.
Technical Details
Make tuningRunner store "arch" instead of chip
Test Plan
Reproduce steps :
python3 ./bin/tuningRunner.py --operation gemm -c ../mlir/utils/performance/configs/tier1-gemm-configs --tuning-space=quick -o gemm_quick.tsv | python3 ./bin/perfRunner.py --operation gemm -c ../mlir/utils/performance/configs/tier1-gemm-configs -t gemm_quick.tsv -o gemm.csv --batch_mlir
This produces NaNs.
After this PR running same produces correct TFlops values.