Fix some benchmark scripts so that they generate the output CSVs by apicciau · Pull Request #2555 · ROCm/aiter

apicciau · 2026-03-31T11:19:22Z

A bug affects the following Triton-based benchmarks, causing them to not generate the output CSV file:

bench_moe_gemm_a4w4.py
bench_moe_gemm_a8w4.py
bench_moe_gemm_a8w8.py
bench_moe_gemm_a8w8_blockscale.py
bench_moe_gemm_int8_smoothquant.py

This fix addresses the issue.

Motivation

When one runs these scripts, it appears that a CSV file should be generated, but it isn’t. For example:

~/aiter# python3 op_tests/op_benchmarks/triton/bench_moe_gemm_a4w4.py --shape 4096 14336 --experts 8 2 --op-regex .\*moe_gemm.\* --num-weight-inits 1
[aiter] import [module_aiter_enum] under /root/aiter/aiter/jit/module_aiter_enum.so
=========================================
logs/gpt-oss-x2/mx4x-mx4w-TP1.csv...
=========================================
batch:     1 | Total latency (us): 84.61 | Kernel latency (us): 39.55 | TFLOPS: 8.908 | TBPS: 2.23

However, the only thing that is generated is the empty directory logs/gpt-oss-x2/mx4x-mx4w-TP1. This change addresses the issue and ensures logs/gpt-oss-x2/mx4x-mx4w-TP1.csv is created (or whatever output file these scripts are expected to generate).

Technical Details

In all the files, the fix involves using the csv package to generate rows one by one. In some cases, I used the underscore character as a placeholder to avoid aliasing an existing variable name.

Test Plan

I executed these scripts interactively.

Test Result

In all the scripts involved in this change, the issue is fixed and the CSV file is generated.

Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py

github-actions · 2026-03-31T11:20:24Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-355`	Run Triton tests on MI355 in addition to MI325
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2555 --add-label <label>

Copilot

Pull request overview

Fixes Triton MoE GEMM benchmark scripts that previously looked like they were writing a results CSV, but actually only printed results and/or created an empty output directory. The PR updates the roofline sweep helper to emit CSV rows during/after the sweep and corrects output path handling so the expected logs/<name>/<...>.csv file is created.

Changes:

Add CSV generation via csv.DictWriter to compute_roofline() across the affected benchmark scripts.
Adjust output path construction to create logs/<name>/ and write a .csv file directly (avoiding “directory-as-CSV-stem” behavior).
Minor cleanup (e.g., avoid bytes shadowing via bytes_, loop variable _, small formatting/docstring tweaks).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`op_tests/op_benchmarks/triton/bench_moe_gemm_int8_smoothquant.py`	Writes CSV results for roofline sweep and fixes output path to a concrete `.csv` under `logs/<name>/`.
`op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8.py`	Adds CSV writing for roofline sweep and fixes output CSV pathing under `logs/<name>/`.
`op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py`	Adds CSV writing and fixes output pathing; also includes substantial line reformatting in the benchmark loop.
`op_tests/op_benchmarks/triton/bench_moe_gemm_a8w4.py`	Adds CSV writing for roofline sweep and fixes output CSV pathing under `logs/<name>/`.
`op_tests/op_benchmarks/triton/bench_moe_gemm_a4w4.py`	Adds CSV writing for roofline sweep and switches output to `logs/<name>/<...>.csv` instead of creating an empty directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nsusanto

LGTM

brunomazzottiamd

LGTM!

It seems like these bench_moe_gemm_*.py scripts haven't been run in a long time.

azaidy · 2026-04-01T04:30:39Z

@lburzawa please review

brunomazzottiamd

LGTM! Let's wait for @lburzawa review.

lburzawa

Looks good!

* Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations

#2498) * Add ctypes C-ABI error bridging to prevent worker crashes during kernel tuning AITER_CHECK and HIP_CALL now throw std::runtime_error instead of calling std::terminate()/exit(0), so exceptions can be caught at the C-ABI boundary. New header aiter_ctypes_error.h provides: - AITER_CTYPES_ERROR_DEF: per-TU thread-local error storage + ABI version probe - AITER_CTYPES_DEFINE_ENTRYPOINT: macro that generates extern "C" int wrapper with automatic try/catch bridging (developer writes normal function body) - aiter_safe_call: template that catches C++ exceptions, stores in TLS, returns -1 Python side (core.py) probes each .so for aiter_ctypes_abi_version to auto-detect the new int-returning convention and raises RuntimeError on failure. asm_moe_2stage.cu is the first kernel converted as a reference implementation. * update gemm * add _VOID marco to define function without return value * [OPUS] Add gfx950 smem transpose load (#2480) * OPUS: add gfx950 smem transpose load path Add smem tr_load/tr_load_if APIs and wire _tr_load to gfx950 ds_read_tr* builtins with scalar/vec dispatch, including clang>=20 u16 support and simplified diagnostics. * tr_load example layout and unit test * Fix error checking in aiter_hip_common.h (#2225) * replace ck_tile api with opus api in some hip kernels (#2533) * replace ck_tile api with opus api in some hip kernels(topk_softmax, moe_fused_gate. sample) * update * rm ck_tile in topk_softmax_kernels_group.cu --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> * Fix some benchmark scripts so that they generate the output CSVs (#2555) * Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations * fix conflict * keep abort behavior if not wrap with aiter_safe_call * abort when hip_call error * fix format * rm changes not related --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> Co-authored-by: YANG Kai <106952055+kaiyang-1@users.noreply.github.com> Co-authored-by: Dragan Mladjenovic <dragan.mladjenovic@amd.com> Co-authored-by: la <46212055+junhaha666@users.noreply.github.com> Co-authored-by: Andrea Picciau <andrea.picciau@amd.com>

* Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations

#2498) * Add ctypes C-ABI error bridging to prevent worker crashes during kernel tuning AITER_CHECK and HIP_CALL now throw std::runtime_error instead of calling std::terminate()/exit(0), so exceptions can be caught at the C-ABI boundary. New header aiter_ctypes_error.h provides: - AITER_CTYPES_ERROR_DEF: per-TU thread-local error storage + ABI version probe - AITER_CTYPES_DEFINE_ENTRYPOINT: macro that generates extern "C" int wrapper with automatic try/catch bridging (developer writes normal function body) - aiter_safe_call: template that catches C++ exceptions, stores in TLS, returns -1 Python side (core.py) probes each .so for aiter_ctypes_abi_version to auto-detect the new int-returning convention and raises RuntimeError on failure. asm_moe_2stage.cu is the first kernel converted as a reference implementation. * update gemm * add _VOID marco to define function without return value * [OPUS] Add gfx950 smem transpose load (#2480) * OPUS: add gfx950 smem transpose load path Add smem tr_load/tr_load_if APIs and wire _tr_load to gfx950 ds_read_tr* builtins with scalar/vec dispatch, including clang>=20 u16 support and simplified diagnostics. * tr_load example layout and unit test * Fix error checking in aiter_hip_common.h (#2225) * replace ck_tile api with opus api in some hip kernels (#2533) * replace ck_tile api with opus api in some hip kernels(topk_softmax, moe_fused_gate. sample) * update * rm ck_tile in topk_softmax_kernels_group.cu --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> * Fix some benchmark scripts so that they generate the output CSVs (#2555) * Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations * fix conflict * keep abort behavior if not wrap with aiter_safe_call * abort when hip_call error * fix format * rm changes not related --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> Co-authored-by: YANG Kai <106952055+kaiyang-1@users.noreply.github.com> Co-authored-by: Dragan Mladjenovic <dragan.mladjenovic@amd.com> Co-authored-by: la <46212055+junhaha666@users.noreply.github.com> Co-authored-by: Andrea Picciau <andrea.picciau@amd.com>

Fix some benchmark scripts so that they generate the output CSVs

0705120

Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py

apicciau requested review from a team, Copilot and nsusanto and removed request for Copilot March 31, 2026 11:19

apicciau self-assigned this Mar 31, 2026

Copilot started reviewing on behalf of apicciau March 31, 2026 11:21 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py Outdated

Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py Outdated

Reformat some MoE GEMM benchmarks with Black

aa58142

brunomazzottiamd added bug Something isn't working triton labels Mar 31, 2026

nsusanto previously approved these changes Mar 31, 2026

View reviewed changes

brunomazzottiamd previously approved these changes Mar 31, 2026

View reviewed changes

Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_a8w4.py Outdated

Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_int8_smoothquant.py Outdated

azaidy requested a review from lburzawa April 1, 2026 04:30

Change comments to proper type annotations

b1e8a62

apicciau dismissed stale reviews from brunomazzottiamd and nsusanto via b1e8a62 April 1, 2026 08:19

brunomazzottiamd reviewed Apr 1, 2026

View reviewed changes

lburzawa approved these changes Apr 2, 2026

View reviewed changes

brunomazzottiamd approved these changes Apr 2, 2026

View reviewed changes

brunomazzottiamd merged commit 470bbfc into main Apr 2, 2026
38 checks passed

brunomazzottiamd deleted the apicciau/pr_benchrmark_csv branch April 2, 2026 13:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix some benchmark scripts so that they generate the output CSVs#2555

Fix some benchmark scripts so that they generate the output CSVs#2555
brunomazzottiamd merged 3 commits intomainfrom
apicciau/pr_benchrmark_csv

apicciau commented Mar 31, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

nsusanto left a comment

Uh oh!

brunomazzottiamd left a comment

Uh oh!

Uh oh!

Uh oh!

azaidy commented Apr 1, 2026

Uh oh!

brunomazzottiamd left a comment

Uh oh!

lburzawa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

apicciau commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

github-actions bot commented Mar 31, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

nsusanto left a comment

Choose a reason for hiding this comment

Uh oh!

brunomazzottiamd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

azaidy commented Apr 1, 2026

Uh oh!

brunomazzottiamd left a comment

Choose a reason for hiding this comment

Uh oh!

lburzawa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

apicciau commented Mar 31, 2026 •

edited

Loading