fix(ci): render all benchmark columns in summary, not just allowlisted backends by hannahli-nv · Pull Request #109 · NVIDIA/TileGym

hannahli-nv · 2026-04-21T03:19:59Z

Description

The benchmark summary page silently dropped columns for any backend not in the hardcoded allowlist {CuTile, PyTorch, Triton, TorchCompile} in .github/scripts/format_benchmark_summary.py. Non-allowlisted keys were also overwriting param_name, so the x-axis column was lost in several benches.

This swap replaces the allowlist with direct iteration over configs[0].keys(), which run_all_json.py already emits in Triton's perf_report order (x-axis params first, then backends).

Before / After (from CI run 24546853000 artifacts)

bench_attention_backward (4 SDPA backends + N_CTX)

Before: | SDPA-Math | CuTile | (N_CTX / SDPA-Flash / SDPA-MemEff dropped)
After: | N_CTX | CuTile | SDPA-Flash | SDPA-MemEff | SDPA-Math |

bench_bmm (multi-axis M/N/K)

Before: | K | CuTile | PyTorch | (M / N dropped)
After: | M | N | K | CuTile | PyTorch |

bench_mix_triton_cutile

Before: | Triton+CuTile | PyTorch | (N dropped)
After: | N | PyTorch | Triton+CuTile |

Every bench already rendering correctly (e.g. the ~22 single-x CuTile + PyTorch files): unchanged.

Diff

14 insertions, 17 deletions in a single file. No schema changes, no new imports, no changes to run_all_json.py or check_benchmark_regression.py.

Test plan

Patch rendered against real CI artifacts for bench_attention_backward, bench_bmm, bench_mhc, bench_mix_triton_cutile, and bench_swa_attention — all produce expected tables.
Every bench whose keys are {x, CuTile, PyTorch} continues to render identically to today.
Next tilegym-ci run's summary page shows all backend columns for the affected benches.

Known limitations (deliberately not addressed here)

run_all_json.py::parse_benchmark_output still tokenizes the pandas header with .split(), so any future bench whose line_names entry contains whitespace would fragment into multiple columns. Hardening this parser is left to a follow-up.
check_benchmark_regression.py has a similar allowlist, but uses break, so it already picks the correct param_name in practice. Leaving it alone to keep this diff focused.

Effect

CI Configuration

config:
  build: true
  # valid options are "ops", "benchmark", and "sanity"
  test: ["benchmark"]

Checklist

Code formatted and imports sorted via repo specifications (./format.sh)
Documentation updated (if needed)
CI configuration reviewed

copy-pr-bot · 2026-04-21T03:20:03Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

hannahli-nv · 2026-04-21T03:28:29Z

/ok to test 374a9f4

format_benchmark_summary.py previously used a hardcoded backend allowlist {CuTile, PyTorch, Triton, TorchCompile}. Any key outside that list was reclassified as the x-axis parameter, and because param_name is a scalar that gets overwritten, only the last such key survived while earlier x-axis columns were silently dropped. Visible effects in existing CI runs: - bench_attention_backward: SDPA-Flash / SDPA-MemEff / SDPA-Math and the N_CTX column were all lost, leaving | SDPA-Math | CuTile |. - bench_bmm / group_gemm / matrix_multiplication / persistent_matmul: only K survived from (M, N, K); M and N were dropped. - bench_mix_triton_cutile: N was dropped, leaving | Triton+CuTile | PyTorch |. - bench_mhc: DeepGemm was dropped when the backend was available. Replace the allowlist with direct iteration over configs[0].keys(), which run_all_json.py already emits in Triton perf_report column order (x-axis params first, then backends). Benches that rendered correctly today are unaffected.

hannahli-nv · 2026-04-21T07:25:22Z

/ok to test 60a2145

hannahli-nv requested a review from xjmxyt April 21, 2026 06:53

xjmxyt approved these changes Apr 21, 2026

View reviewed changes

hannahli-nv force-pushed the fix/benchmark-summary-allowlist branch from 374a9f4 to 60a2145 Compare April 21, 2026 07:25

hannahli-nv merged commit 479a37d into main Apr 21, 2026
17 checks passed

hannahli-nv deleted the fix/benchmark-summary-allowlist branch April 21, 2026 09:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): render all benchmark columns in summary, not just allowlisted backends#109

fix(ci): render all benchmark columns in summary, not just allowlisted backends#109
hannahli-nv merged 1 commit intomainfrom
fix/benchmark-summary-allowlist

hannahli-nv commented Apr 21, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

hannahli-nv commented Apr 21, 2026

Uh oh!

hannahli-nv commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hannahli-nv commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Before / After (from CI run 24546853000 artifacts)

Diff

Test plan

Known limitations (deliberately not addressed here)

Effect

CI Configuration

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 21, 2026

Uh oh!

hannahli-nv commented Apr 21, 2026

Uh oh!

hannahli-nv commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hannahli-nv commented Apr 21, 2026 •

edited

Loading