fix(ci): render all benchmark columns in summary, not just allowlisted backends#109
Merged
hannahli-nv merged 1 commit intomainfrom Apr 21, 2026
Merged
Conversation
Collaborator
Author
|
/ok to test 374a9f4 |
xjmxyt
approved these changes
Apr 21, 2026
format_benchmark_summary.py previously used a hardcoded backend allowlist
{CuTile, PyTorch, Triton, TorchCompile}. Any key outside that list was
reclassified as the x-axis parameter, and because param_name is a scalar
that gets overwritten, only the last such key survived while earlier
x-axis columns were silently dropped.
Visible effects in existing CI runs:
- bench_attention_backward: SDPA-Flash / SDPA-MemEff / SDPA-Math and the
N_CTX column were all lost, leaving | SDPA-Math | CuTile |.
- bench_bmm / group_gemm / matrix_multiplication / persistent_matmul:
only K survived from (M, N, K); M and N were dropped.
- bench_mix_triton_cutile: N was dropped, leaving
| Triton+CuTile | PyTorch |.
- bench_mhc: DeepGemm was dropped when the backend was available.
Replace the allowlist with direct iteration over configs[0].keys(),
which run_all_json.py already emits in Triton perf_report column order
(x-axis params first, then backends). Benches that rendered correctly
today are unaffected.
374a9f4 to
60a2145
Compare
Collaborator
Author
|
/ok to test 60a2145 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The benchmark summary page silently dropped columns for any backend not in the hardcoded allowlist
{CuTile, PyTorch, Triton, TorchCompile}in.github/scripts/format_benchmark_summary.py. Non-allowlisted keys were also overwritingparam_name, so the x-axis column was lost in several benches.This swap replaces the allowlist with direct iteration over
configs[0].keys(), whichrun_all_json.pyalready emits in Triton's perf_report order (x-axis params first, then backends).Before / After (from CI run 24546853000 artifacts)
bench_attention_backward(4 SDPA backends + N_CTX)| SDPA-Math | CuTile |(N_CTX / SDPA-Flash / SDPA-MemEff dropped)| N_CTX | CuTile | SDPA-Flash | SDPA-MemEff | SDPA-Math |bench_bmm(multi-axis M/N/K)| K | CuTile | PyTorch |(M / N dropped)| M | N | K | CuTile | PyTorch |bench_mix_triton_cutile| Triton+CuTile | PyTorch |(N dropped)| N | PyTorch | Triton+CuTile |Every bench already rendering correctly (e.g. the ~22 single-x
CuTile + PyTorchfiles): unchanged.Diff
14 insertions, 17 deletions in a single file. No schema changes, no new imports, no changes to
run_all_json.pyorcheck_benchmark_regression.py.Test plan
bench_attention_backward,bench_bmm,bench_mhc,bench_mix_triton_cutile, andbench_swa_attention— all produce expected tables.{x, CuTile, PyTorch}continues to render identically to today.tilegym-cirun's summary page shows all backend columns for the affected benches.Known limitations (deliberately not addressed here)
run_all_json.py::parse_benchmark_outputstill tokenizes the pandas header with.split(), so any future bench whoseline_namesentry contains whitespace would fragment into multiple columns. Hardening this parser is left to a follow-up.check_benchmark_regression.pyhas a similar allowlist, but usesbreak, so it already picks the correctparam_namein practice. Leaving it alone to keep this diff focused.Effect
CI Configuration
Checklist
./format.sh)