Skip to content

Fix some benchmark scripts so that they generate the output CSVs#2555

Merged
brunomazzottiamd merged 3 commits intomainfrom
apicciau/pr_benchrmark_csv
Apr 2, 2026
Merged

Fix some benchmark scripts so that they generate the output CSVs#2555
brunomazzottiamd merged 3 commits intomainfrom
apicciau/pr_benchrmark_csv

Conversation

@apicciau
Copy link
Copy Markdown
Contributor

@apicciau apicciau commented Mar 31, 2026

A bug affects the following Triton-based benchmarks, causing them to not generate the output CSV file:

  • bench_moe_gemm_a4w4.py
  • bench_moe_gemm_a8w4.py
  • bench_moe_gemm_a8w8.py
  • bench_moe_gemm_a8w8_blockscale.py
  • bench_moe_gemm_int8_smoothquant.py

This fix addresses the issue.

Motivation

When one runs these scripts, it appears that a CSV file should be generated, but it isn’t. For example:

~/aiter# python3 op_tests/op_benchmarks/triton/bench_moe_gemm_a4w4.py --shape 4096 14336 --experts 8 2 --op-regex .\*moe_gemm.\* --num-weight-inits 1
[aiter] import [module_aiter_enum] under /root/aiter/aiter/jit/module_aiter_enum.so
=========================================
logs/gpt-oss-x2/mx4x-mx4w-TP1.csv...
=========================================
batch:     1 | Total latency (us): 84.61 | Kernel latency (us): 39.55 | TFLOPS: 8.908 | TBPS: 2.23

However, the only thing that is generated is the empty directory logs/gpt-oss-x2/mx4x-mx4w-TP1. This change addresses the issue and ensures logs/gpt-oss-x2/mx4x-mx4w-TP1.csv is created (or whatever output file these scripts are expected to generate).

Technical Details

In all the files, the fix involves using the csv package to generate rows one by one. In some cases, I used the underscore character as a placeholder to avoid aliasing an existing variable name.

Test Plan

I executed these scripts interactively.

Test Result

In all the scripts involved in this change, the issue is fixed and the CSV file is generated.

Affects the following Triton-based benchmarks:
* bench_moe_gemm_a4w4.py
* bench_moe_gemm_a8w4.py
* bench_moe_gemm_a8w8.py
* bench_moe_gemm_a8w8_blockscale.py
* bench_moe_gemm_int8_smoothquant.py
@apicciau apicciau requested review from a team, Copilot and nsusanto and removed request for Copilot March 31, 2026 11:19
@apicciau apicciau self-assigned this Mar 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:triton-355 Run Triton tests on MI355 in addition to MI325
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2555 --add-label <label>

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes Triton MoE GEMM benchmark scripts that previously looked like they were writing a results CSV, but actually only printed results and/or created an empty output directory. The PR updates the roofline sweep helper to emit CSV rows during/after the sweep and corrects output path handling so the expected logs/<name>/<...>.csv file is created.

Changes:

  • Add CSV generation via csv.DictWriter to compute_roofline() across the affected benchmark scripts.
  • Adjust output path construction to create logs/<name>/ and write a .csv file directly (avoiding “directory-as-CSV-stem” behavior).
  • Minor cleanup (e.g., avoid bytes shadowing via bytes_, loop variable _, small formatting/docstring tweaks).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
op_tests/op_benchmarks/triton/bench_moe_gemm_int8_smoothquant.py Writes CSV results for roofline sweep and fixes output path to a concrete .csv under logs/<name>/.
op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8.py Adds CSV writing for roofline sweep and fixes output CSV pathing under logs/<name>/.
op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py Adds CSV writing and fixes output pathing; also includes substantial line reformatting in the benchmark loop.
op_tests/op_benchmarks/triton/bench_moe_gemm_a8w4.py Adds CSV writing for roofline sweep and fixes output CSV pathing under logs/<name>/.
op_tests/op_benchmarks/triton/bench_moe_gemm_a4w4.py Adds CSV writing for roofline sweep and switches output to logs/<name>/<...>.csv instead of creating an empty directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py Outdated
Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_a8w8_blockscale.py Outdated
@brunomazzottiamd brunomazzottiamd added bug Something isn't working triton labels Mar 31, 2026
nsusanto
nsusanto previously approved these changes Mar 31, 2026
Copy link
Copy Markdown
Contributor

@nsusanto nsusanto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@brunomazzottiamd brunomazzottiamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

It seems like these bench_moe_gemm_*.py scripts haven't been run in a long time.

Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_a8w4.py Outdated
Comment thread op_tests/op_benchmarks/triton/bench_moe_gemm_int8_smoothquant.py Outdated
@azaidy azaidy requested a review from lburzawa April 1, 2026 04:30
@azaidy
Copy link
Copy Markdown
Contributor

azaidy commented Apr 1, 2026

@lburzawa please review

@apicciau apicciau dismissed stale reviews from brunomazzottiamd and nsusanto via b1e8a62 April 1, 2026 08:19
Copy link
Copy Markdown
Contributor

@brunomazzottiamd brunomazzottiamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's wait for @lburzawa review.

Copy link
Copy Markdown
Contributor

@lburzawa lburzawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@brunomazzottiamd brunomazzottiamd merged commit 470bbfc into main Apr 2, 2026
38 checks passed
@brunomazzottiamd brunomazzottiamd deleted the apicciau/pr_benchrmark_csv branch April 2, 2026 13:13
yzhou103 pushed a commit that referenced this pull request Apr 2, 2026
* Fix some benchmark scripts so that they generate the output CSVs

Affects the following Triton-based benchmarks:
* bench_moe_gemm_a4w4.py
* bench_moe_gemm_a8w4.py
* bench_moe_gemm_a8w8.py
* bench_moe_gemm_a8w8_blockscale.py
* bench_moe_gemm_int8_smoothquant.py

* Reformat some MoE GEMM benchmarks with Black

* Change comments to proper type annotations
yzhou103 added a commit that referenced this pull request Apr 8, 2026
#2498)

* Add ctypes C-ABI error bridging to prevent worker crashes during kernel tuning

AITER_CHECK and HIP_CALL now throw std::runtime_error instead of calling
std::terminate()/exit(0), so exceptions can be caught at the C-ABI boundary.

New header aiter_ctypes_error.h provides:
- AITER_CTYPES_ERROR_DEF: per-TU thread-local error storage + ABI version probe
- AITER_CTYPES_DEFINE_ENTRYPOINT: macro that generates extern "C" int wrapper
  with automatic try/catch bridging (developer writes normal function body)
- aiter_safe_call: template that catches C++ exceptions, stores in TLS, returns -1

Python side (core.py) probes each .so for aiter_ctypes_abi_version to auto-detect
the new int-returning convention and raises RuntimeError on failure.

asm_moe_2stage.cu is the first kernel converted as a reference implementation.

* update gemm

* add _VOID marco to define function without return value

* [OPUS] Add gfx950 smem transpose load (#2480)

* OPUS: add gfx950 smem transpose load path

Add smem tr_load/tr_load_if APIs and wire _tr_load to gfx950 ds_read_tr* builtins with scalar/vec dispatch, including clang>=20 u16 support and simplified diagnostics.

* tr_load example layout and unit test

* Fix error checking in aiter_hip_common.h (#2225)

* replace ck_tile api with opus api in some hip kernels (#2533)

* replace ck_tile api with opus api in some hip kernels(topk_softmax, moe_fused_gate. sample)

* update

* rm ck_tile in topk_softmax_kernels_group.cu

---------

Co-authored-by: Xin Huang <Xin.Huang@amd.com>

* Fix some benchmark scripts so that they generate the output CSVs (#2555)

* Fix some benchmark scripts so that they generate the output CSVs

Affects the following Triton-based benchmarks:
* bench_moe_gemm_a4w4.py
* bench_moe_gemm_a8w4.py
* bench_moe_gemm_a8w8.py
* bench_moe_gemm_a8w8_blockscale.py
* bench_moe_gemm_int8_smoothquant.py

* Reformat some MoE GEMM benchmarks with Black

* Change comments to proper type annotations

* fix conflict

* keep abort behavior if not wrap with aiter_safe_call

* abort when hip_call error

* fix format

* rm changes not related

---------

Co-authored-by: Xin Huang <Xin.Huang@amd.com>
Co-authored-by: YANG Kai <106952055+kaiyang-1@users.noreply.github.com>
Co-authored-by: Dragan Mladjenovic <dragan.mladjenovic@amd.com>
Co-authored-by: la <46212055+junhaha666@users.noreply.github.com>
Co-authored-by: Andrea Picciau <andrea.picciau@amd.com>
yzhou103 pushed a commit that referenced this pull request Apr 8, 2026
* Fix some benchmark scripts so that they generate the output CSVs

Affects the following Triton-based benchmarks:
* bench_moe_gemm_a4w4.py
* bench_moe_gemm_a8w4.py
* bench_moe_gemm_a8w8.py
* bench_moe_gemm_a8w8_blockscale.py
* bench_moe_gemm_int8_smoothquant.py

* Reformat some MoE GEMM benchmarks with Black

* Change comments to proper type annotations
yzhou103 added a commit that referenced this pull request Apr 8, 2026
#2498)

* Add ctypes C-ABI error bridging to prevent worker crashes during kernel tuning

AITER_CHECK and HIP_CALL now throw std::runtime_error instead of calling
std::terminate()/exit(0), so exceptions can be caught at the C-ABI boundary.

New header aiter_ctypes_error.h provides:
- AITER_CTYPES_ERROR_DEF: per-TU thread-local error storage + ABI version probe
- AITER_CTYPES_DEFINE_ENTRYPOINT: macro that generates extern "C" int wrapper
  with automatic try/catch bridging (developer writes normal function body)
- aiter_safe_call: template that catches C++ exceptions, stores in TLS, returns -1

Python side (core.py) probes each .so for aiter_ctypes_abi_version to auto-detect
the new int-returning convention and raises RuntimeError on failure.

asm_moe_2stage.cu is the first kernel converted as a reference implementation.

* update gemm

* add _VOID marco to define function without return value

* [OPUS] Add gfx950 smem transpose load (#2480)

* OPUS: add gfx950 smem transpose load path

Add smem tr_load/tr_load_if APIs and wire _tr_load to gfx950 ds_read_tr* builtins with scalar/vec dispatch, including clang>=20 u16 support and simplified diagnostics.

* tr_load example layout and unit test

* Fix error checking in aiter_hip_common.h (#2225)

* replace ck_tile api with opus api in some hip kernels (#2533)

* replace ck_tile api with opus api in some hip kernels(topk_softmax, moe_fused_gate. sample)

* update

* rm ck_tile in topk_softmax_kernels_group.cu

---------

Co-authored-by: Xin Huang <Xin.Huang@amd.com>

* Fix some benchmark scripts so that they generate the output CSVs (#2555)

* Fix some benchmark scripts so that they generate the output CSVs

Affects the following Triton-based benchmarks:
* bench_moe_gemm_a4w4.py
* bench_moe_gemm_a8w4.py
* bench_moe_gemm_a8w8.py
* bench_moe_gemm_a8w8_blockscale.py
* bench_moe_gemm_int8_smoothquant.py

* Reformat some MoE GEMM benchmarks with Black

* Change comments to proper type annotations

* fix conflict

* keep abort behavior if not wrap with aiter_safe_call

* abort when hip_call error

* fix format

* rm changes not related

---------

Co-authored-by: Xin Huang <Xin.Huang@amd.com>
Co-authored-by: YANG Kai <106952055+kaiyang-1@users.noreply.github.com>
Co-authored-by: Dragan Mladjenovic <dragan.mladjenovic@amd.com>
Co-authored-by: la <46212055+junhaha666@users.noreply.github.com>
Co-authored-by: Andrea Picciau <andrea.picciau@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working triton

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants