Fix some benchmark scripts so that they generate the output CSVs#2555
Fix some benchmark scripts so that they generate the output CSVs#2555brunomazzottiamd merged 3 commits intomainfrom
Conversation
Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
There was a problem hiding this comment.
Pull request overview
Fixes Triton MoE GEMM benchmark scripts that previously looked like they were writing a results CSV, but actually only printed results and/or created an empty output directory. The PR updates the roofline sweep helper to emit CSV rows during/after the sweep and corrects output path handling so the expected logs/<name>/<...>.csv file is created.
Changes:
- Add CSV generation via
csv.DictWritertocompute_roofline()across the affected benchmark scripts. - Adjust output path construction to create
logs/<name>/and write a.csvfile directly (avoiding “directory-as-CSV-stem” behavior). - Minor cleanup (e.g., avoid
bytesshadowing viabytes_, loop variable_, small formatting/docstring tweaks).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
op_tests/op_benchmarks/triton/bench_moe_gemm_int8_smoothquant.py |
Writes CSV results for roofline sweep and fixes output path to a concrete .csv under logs/<name>/. |
op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8.py |
Adds CSV writing for roofline sweep and fixes output CSV pathing under logs/<name>/. |
op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py |
Adds CSV writing and fixes output pathing; also includes substantial line reformatting in the benchmark loop. |
op_tests/op_benchmarks/triton/bench_moe_gemm_a8w4.py |
Adds CSV writing for roofline sweep and fixes output CSV pathing under logs/<name>/. |
op_tests/op_benchmarks/triton/bench_moe_gemm_a4w4.py |
Adds CSV writing for roofline sweep and switches output to logs/<name>/<...>.csv instead of creating an empty directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
brunomazzottiamd
left a comment
There was a problem hiding this comment.
LGTM!
It seems like these bench_moe_gemm_*.py scripts haven't been run in a long time.
|
@lburzawa please review |
brunomazzottiamd
left a comment
There was a problem hiding this comment.
LGTM! Let's wait for @lburzawa review.
* Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations
#2498) * Add ctypes C-ABI error bridging to prevent worker crashes during kernel tuning AITER_CHECK and HIP_CALL now throw std::runtime_error instead of calling std::terminate()/exit(0), so exceptions can be caught at the C-ABI boundary. New header aiter_ctypes_error.h provides: - AITER_CTYPES_ERROR_DEF: per-TU thread-local error storage + ABI version probe - AITER_CTYPES_DEFINE_ENTRYPOINT: macro that generates extern "C" int wrapper with automatic try/catch bridging (developer writes normal function body) - aiter_safe_call: template that catches C++ exceptions, stores in TLS, returns -1 Python side (core.py) probes each .so for aiter_ctypes_abi_version to auto-detect the new int-returning convention and raises RuntimeError on failure. asm_moe_2stage.cu is the first kernel converted as a reference implementation. * update gemm * add _VOID marco to define function without return value * [OPUS] Add gfx950 smem transpose load (#2480) * OPUS: add gfx950 smem transpose load path Add smem tr_load/tr_load_if APIs and wire _tr_load to gfx950 ds_read_tr* builtins with scalar/vec dispatch, including clang>=20 u16 support and simplified diagnostics. * tr_load example layout and unit test * Fix error checking in aiter_hip_common.h (#2225) * replace ck_tile api with opus api in some hip kernels (#2533) * replace ck_tile api with opus api in some hip kernels(topk_softmax, moe_fused_gate. sample) * update * rm ck_tile in topk_softmax_kernels_group.cu --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> * Fix some benchmark scripts so that they generate the output CSVs (#2555) * Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations * fix conflict * keep abort behavior if not wrap with aiter_safe_call * abort when hip_call error * fix format * rm changes not related --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> Co-authored-by: YANG Kai <106952055+kaiyang-1@users.noreply.github.com> Co-authored-by: Dragan Mladjenovic <dragan.mladjenovic@amd.com> Co-authored-by: la <46212055+junhaha666@users.noreply.github.com> Co-authored-by: Andrea Picciau <andrea.picciau@amd.com>
* Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations
#2498) * Add ctypes C-ABI error bridging to prevent worker crashes during kernel tuning AITER_CHECK and HIP_CALL now throw std::runtime_error instead of calling std::terminate()/exit(0), so exceptions can be caught at the C-ABI boundary. New header aiter_ctypes_error.h provides: - AITER_CTYPES_ERROR_DEF: per-TU thread-local error storage + ABI version probe - AITER_CTYPES_DEFINE_ENTRYPOINT: macro that generates extern "C" int wrapper with automatic try/catch bridging (developer writes normal function body) - aiter_safe_call: template that catches C++ exceptions, stores in TLS, returns -1 Python side (core.py) probes each .so for aiter_ctypes_abi_version to auto-detect the new int-returning convention and raises RuntimeError on failure. asm_moe_2stage.cu is the first kernel converted as a reference implementation. * update gemm * add _VOID marco to define function without return value * [OPUS] Add gfx950 smem transpose load (#2480) * OPUS: add gfx950 smem transpose load path Add smem tr_load/tr_load_if APIs and wire _tr_load to gfx950 ds_read_tr* builtins with scalar/vec dispatch, including clang>=20 u16 support and simplified diagnostics. * tr_load example layout and unit test * Fix error checking in aiter_hip_common.h (#2225) * replace ck_tile api with opus api in some hip kernels (#2533) * replace ck_tile api with opus api in some hip kernels(topk_softmax, moe_fused_gate. sample) * update * rm ck_tile in topk_softmax_kernels_group.cu --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> * Fix some benchmark scripts so that they generate the output CSVs (#2555) * Fix some benchmark scripts so that they generate the output CSVs Affects the following Triton-based benchmarks: * bench_moe_gemm_a4w4.py * bench_moe_gemm_a8w4.py * bench_moe_gemm_a8w8.py * bench_moe_gemm_a8w8_blockscale.py * bench_moe_gemm_int8_smoothquant.py * Reformat some MoE GEMM benchmarks with Black * Change comments to proper type annotations * fix conflict * keep abort behavior if not wrap with aiter_safe_call * abort when hip_call error * fix format * rm changes not related --------- Co-authored-by: Xin Huang <Xin.Huang@amd.com> Co-authored-by: YANG Kai <106952055+kaiyang-1@users.noreply.github.com> Co-authored-by: Dragan Mladjenovic <dragan.mladjenovic@amd.com> Co-authored-by: la <46212055+junhaha666@users.noreply.github.com> Co-authored-by: Andrea Picciau <andrea.picciau@amd.com>
A bug affects the following Triton-based benchmarks, causing them to not generate the output CSV file:
bench_moe_gemm_a4w4.pybench_moe_gemm_a8w4.pybench_moe_gemm_a8w8.pybench_moe_gemm_a8w8_blockscale.pybench_moe_gemm_int8_smoothquant.pyThis fix addresses the issue.
Motivation
When one runs these scripts, it appears that a CSV file should be generated, but it isn’t. For example:
However, the only thing that is generated is the empty directory
logs/gpt-oss-x2/mx4x-mx4w-TP1. This change addresses the issue and ensureslogs/gpt-oss-x2/mx4x-mx4w-TP1.csvis created (or whatever output file these scripts are expected to generate).Technical Details
In all the files, the fix involves using the
csvpackage to generate rows one by one. In some cases, I used the underscore character as a placeholder to avoid aliasing an existing variable name.Test Plan
I executed these scripts interactively.
Test Result
In all the scripts involved in this change, the issue is fixed and the CSV file is generated.