Skip to content

fix(ci): render all benchmark columns in summary, not just allowlisted backends#109

Merged
hannahli-nv merged 1 commit intomainfrom
fix/benchmark-summary-allowlist
Apr 21, 2026
Merged

fix(ci): render all benchmark columns in summary, not just allowlisted backends#109
hannahli-nv merged 1 commit intomainfrom
fix/benchmark-summary-allowlist

Conversation

@hannahli-nv
Copy link
Copy Markdown
Collaborator

@hannahli-nv hannahli-nv commented Apr 21, 2026

Description

The benchmark summary page silently dropped columns for any backend not in the hardcoded allowlist {CuTile, PyTorch, Triton, TorchCompile} in .github/scripts/format_benchmark_summary.py. Non-allowlisted keys were also overwriting param_name, so the x-axis column was lost in several benches.

This swap replaces the allowlist with direct iteration over configs[0].keys(), which run_all_json.py already emits in Triton's perf_report order (x-axis params first, then backends).

Before / After (from CI run 24546853000 artifacts)

bench_attention_backward (4 SDPA backends + N_CTX)

  • Before: | SDPA-Math | CuTile | (N_CTX / SDPA-Flash / SDPA-MemEff dropped)
  • After: | N_CTX | CuTile | SDPA-Flash | SDPA-MemEff | SDPA-Math |

bench_bmm (multi-axis M/N/K)

  • Before: | K | CuTile | PyTorch | (M / N dropped)
  • After: | M | N | K | CuTile | PyTorch |

bench_mix_triton_cutile

  • Before: | Triton+CuTile | PyTorch | (N dropped)
  • After: | N | PyTorch | Triton+CuTile |

Every bench already rendering correctly (e.g. the ~22 single-x CuTile + PyTorch files): unchanged.

Diff

14 insertions, 17 deletions in a single file. No schema changes, no new imports, no changes to run_all_json.py or check_benchmark_regression.py.

Test plan

  • Patch rendered against real CI artifacts for bench_attention_backward, bench_bmm, bench_mhc, bench_mix_triton_cutile, and bench_swa_attention — all produce expected tables.
  • Every bench whose keys are {x, CuTile, PyTorch} continues to render identically to today.
  • Next tilegym-ci run's summary page shows all backend columns for the affected benches.

Known limitations (deliberately not addressed here)

  • run_all_json.py::parse_benchmark_output still tokenizes the pandas header with .split(), so any future bench whose line_names entry contains whitespace would fragment into multiple columns. Hardening this parser is left to a follow-up.
  • check_benchmark_regression.py has a similar allowlist, but uses break, so it already picks the correct param_name in practice. Leaving it alone to keep this diff focused.

Effect

image

CI Configuration

config:
  build: true
  # valid options are "ops", "benchmark", and "sanity"
  test: ["benchmark"]

Checklist

  • Code formatted and imports sorted via repo specifications (./format.sh)
  • Documentation updated (if needed)
  • CI configuration reviewed

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 21, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 374a9f4

@hannahli-nv hannahli-nv requested a review from xjmxyt April 21, 2026 06:53
format_benchmark_summary.py previously used a hardcoded backend allowlist
{CuTile, PyTorch, Triton, TorchCompile}. Any key outside that list was
reclassified as the x-axis parameter, and because param_name is a scalar
that gets overwritten, only the last such key survived while earlier
x-axis columns were silently dropped.

Visible effects in existing CI runs:
- bench_attention_backward: SDPA-Flash / SDPA-MemEff / SDPA-Math and the
  N_CTX column were all lost, leaving | SDPA-Math | CuTile |.
- bench_bmm / group_gemm / matrix_multiplication / persistent_matmul:
  only K survived from (M, N, K); M and N were dropped.
- bench_mix_triton_cutile: N was dropped, leaving
  | Triton+CuTile | PyTorch |.
- bench_mhc: DeepGemm was dropped when the backend was available.

Replace the allowlist with direct iteration over configs[0].keys(),
which run_all_json.py already emits in Triton perf_report column order
(x-axis params first, then backends). Benches that rendered correctly
today are unaffected.
@hannahli-nv hannahli-nv force-pushed the fix/benchmark-summary-allowlist branch from 374a9f4 to 60a2145 Compare April 21, 2026 07:25
@hannahli-nv
Copy link
Copy Markdown
Collaborator Author

/ok to test 60a2145

@hannahli-nv hannahli-nv merged commit 479a37d into main Apr 21, 2026
17 checks passed
@hannahli-nv hannahli-nv deleted the fix/benchmark-summary-allowlist branch April 21, 2026 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants