Reformat a2asweep output to match gfxsweep style by nileshnegi · Pull Request #272 · ROCm/TransferBench

nileshnegi · 2026-04-27T04:25:53Z

Motivation

Unify a2asweep output with the better-formatted gfxsweep output.
Also adds a summary table at the end.

Technical Details

Replace printf with Utils::Print throughout the sweep table
Use sep-character column separators (space vs comma for CSV mode)
Move blocksize out of row prefix into the header line, print header per blocksize block like gfxsweep does
Print summary block at end
Default BLOCKSIZES changed from {256} to {256,512,768,1024}
Single unified table: header printed once, blockSize is a row column
USE_HIP_EVENTS controls timing mode (preset default: 1 = GPU-event-timed)
- USE_HIP_EVENTS=1: GPU-event-timed per-executor minBw; SHOW_MIN_ONLY=0 adds a maxBw column per SE
- USE_HIP_EVENTS=0: CPU wall-clock avgTotalBandwidthGbPerSec
Banner and best-result summary reflect the active timing mode
Results map keyed by (blockSize, numSes, unroll) for verbose output

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

- Default BLOCKSIZES changed from {256} to {256,512,768,1024} - Single unified table: header printed once, blockSize is a row column - USE_HIP_EVENTS controls timing mode (preset default: 1 = GPU-event timed) USE_HIP_EVENTS=1: GPU-event-timed per-executor minBw; SHOW_MIN_ONLY=0 adds a maxBw column per SE USE_HIP_EVENTS=0: CPU wall-clock avgTotalBandwidthGbPerSec - Banner and best-result summary reflect the active timing mode - Results map keyed by (blockSize, numSes, unroll) for verbose output - Increase column width for output values - Fixed [WARN} typo Co-authored-by: Claude <claude@anthropic.com>

Copilot

Pull request overview

This PR updates the a2asweep preset output formatting to align more closely with the gfxsweep style, including a unified table layout and an end-of-run “best result” summary, while also expanding the default blocksize sweep set.

Changes:

Default USE_HIP_EVENTS behavior for a2asweep and reflect timing mode in the banner/output.
Reformat sweep output into a single table with CSV-aware separators and a “best bandwidth” summary block.
Expand default BLOCKSIZES from {256} to {256,512,768,1024}.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Fix verbose block using stale numSubExecs: copy transfers per (blockSize,c,unroll) combination before calling PrintResults so subexec count matches stored result - Gate results map insertion on verbose flag to avoid storing all TestResults when VERBOSE=0 - Guard best-result summary block on bestBlock != -1 to suppress misleading -1 output if all RunTransfers calls fail - Widen value columns from %7.2f to %8.2f to accommodate 4-digit GB/s values - Add note on spray targetCount asymmetry for non-uniform A2A_DIRECT topologies Co-authored-by: Claude <claude@anthropic.com>

- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: AtlantaPepsi <timhu102@gmail.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

nileshnegi requested a review from a team as a code owner April 27, 2026 04:25

Copilot AI review requested due to automatic review settings April 27, 2026 04:25

Copilot started reviewing on behalf of nileshnegi April 27, 2026 04:27 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

Comment thread src/client/Presets/AllToAllSweep.hpp Outdated

Comment thread src/client/Presets/AllToAllSweep.hpp Outdated

Comment thread src/client/Presets/AllToAllSweep.hpp

Comment thread src/client/Presets/AllToAllSweep.hpp Outdated

gilbertlee-amd approved these changes Apr 27, 2026

View reviewed changes

nileshnegi merged commit 1281d0c into candidate Apr 27, 2026
4 checks passed

nileshnegi deleted the users/nileshnegi/fix/a2asweep-output-format branch April 27, 2026 05:23

nileshnegi mentioned this pull request Apr 27, 2026

TransferBench v1.67.0 #273

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reformat a2asweep output to match gfxsweep style#272

Reformat a2asweep output to match gfxsweep style#272
nileshnegi merged 2 commits intocandidatefrom
users/nileshnegi/fix/a2asweep-output-format

nileshnegi commented Apr 27, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nileshnegi commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nileshnegi commented Apr 27, 2026 •

edited

Loading