Skip to content

Reformat a2asweep output to match gfxsweep style#272

Merged
nileshnegi merged 2 commits intocandidatefrom
users/nileshnegi/fix/a2asweep-output-format
Apr 27, 2026
Merged

Reformat a2asweep output to match gfxsweep style#272
nileshnegi merged 2 commits intocandidatefrom
users/nileshnegi/fix/a2asweep-output-format

Conversation

@nileshnegi
Copy link
Copy Markdown
Collaborator

@nileshnegi nileshnegi commented Apr 27, 2026

Motivation

Unify a2asweep output with the better-formatted gfxsweep output.
Also adds a summary table at the end.

Technical Details

  • Replace printf with Utils::Print throughout the sweep table
  • Use sep-character column separators (space vs comma for CSV mode)
  • Move blocksize out of row prefix into the header line, print header per blocksize block like gfxsweep does
  • Print summary block at end
  • Default BLOCKSIZES changed from {256} to {256,512,768,1024}
  • Single unified table: header printed once, blockSize is a row column
  • USE_HIP_EVENTS controls timing mode (preset default: 1 = GPU-event-timed)
    • USE_HIP_EVENTS=1: GPU-event-timed per-executor minBw; SHOW_MIN_ONLY=0 adds a maxBw column per SE
    • USE_HIP_EVENTS=0: CPU wall-clock avgTotalBandwidthGbPerSec
  • Banner and best-result summary reflect the active timing mode
  • Results map keyed by (blockSize, numSes, unroll) for verbose output

Test Plan

Test Result

Submission Checklist

- Default BLOCKSIZES changed from {256} to {256,512,768,1024}
- Single unified table: header printed once, blockSize is a row column
- USE_HIP_EVENTS controls timing mode (preset default: 1 = GPU-event timed)
    USE_HIP_EVENTS=1: GPU-event-timed per-executor minBw; SHOW_MIN_ONLY=0
                      adds a maxBw column per SE
    USE_HIP_EVENTS=0: CPU wall-clock avgTotalBandwidthGbPerSec
- Banner and best-result summary reflect the active timing mode
- Results map keyed by (blockSize, numSes, unroll) for verbose output
- Increase column width for output values
- Fixed [WARN} typo

Co-authored-by: Claude <claude@anthropic.com>
@nileshnegi nileshnegi requested a review from a team as a code owner April 27, 2026 04:25
Copilot AI review requested due to automatic review settings April 27, 2026 04:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the a2asweep preset output formatting to align more closely with the gfxsweep style, including a unified table layout and an end-of-run “best result” summary, while also expanding the default blocksize sweep set.

Changes:

  • Default USE_HIP_EVENTS behavior for a2asweep and reflect timing mode in the banner/output.
  • Reformat sweep output into a single table with CSV-aware separators and a “best bandwidth” summary block.
  • Expand default BLOCKSIZES from {256} to {256,512,768,1024}.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/client/Presets/AllToAllSweep.hpp Outdated
Comment thread src/client/Presets/AllToAllSweep.hpp Outdated
Comment thread src/client/Presets/AllToAllSweep.hpp
Comment thread src/client/Presets/AllToAllSweep.hpp Outdated
- Fix verbose block using stale numSubExecs: copy transfers per (blockSize,c,unroll)
  combination before calling PrintResults so subexec count matches stored result
- Gate results map insertion on verbose flag to avoid storing all TestResults
  when VERBOSE=0
- Guard best-result summary block on bestBlock != -1 to suppress misleading
  -1 output if all RunTransfers calls fail
- Widen value columns from %7.2f to %8.2f to accommodate 4-digit GB/s values
- Add note on spray targetCount asymmetry for non-uniform A2A_DIRECT topologies

Co-authored-by: Claude <claude@anthropic.com>
@nileshnegi nileshnegi merged commit 1281d0c into candidate Apr 27, 2026
4 checks passed
@nileshnegi nileshnegi deleted the users/nileshnegi/fix/a2asweep-output-format branch April 27, 2026 05:23
@nileshnegi nileshnegi mentioned this pull request Apr 27, 2026
1 task
nileshnegi added a commit that referenced this pull request May 2, 2026
- Initial pod communication support (#235)
- cuda + MNNVL update & pod presets (#241)
- Increase CQ size for high qps (#244)
- fix hang when NVML is present but fabricmanager isnt (#246)
- Adding nica2a preset  (#248)
- Adding HBM read bandwidth preset (#250)
- Pod Ring preset (#251)
- gfxsweep preset (#254) (#256)
- Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255)
- Adding a wallclock consistency detection preset (#258)
- Adding smoketest preset for simple correctness tests (#266)
- Help / envvars / presets presets (#267)
- Modernize CMake build (#268)
- Replace version-based pod/amd-smi detection with compile-time API probes (#269)
- Fix collective mismatch hangs in multi-rank error paths (#270)
- Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271)
- Reformat a2asweep output to match gfxsweep style (#272)
- Gfx sweep update (#274)
- Increasing flush frequency in smoketest (#275)
- Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
- Fixes for cuMem compilation and invalid device ordinal (#278)
- Simplifying socket connect, allow for using host address (#279)
- Updating podring to run on single node without need to force single pod (#280)
- Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)

---------

Co-authored-by: AtlantaPepsi <timhu102@gmail.com>
Co-authored-by: Pak Nin Lui <pak.lui@amd.com>
Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
nileshnegi added a commit that referenced this pull request May 2, 2026
- Initial pod communication support (#235)
- cuda + MNNVL update & pod presets (#241)
- Increase CQ size for high qps (#244)
- fix hang when NVML is present but fabricmanager isnt (#246)
- Adding nica2a preset  (#248)
- Adding HBM read bandwidth preset (#250)
- Pod Ring preset (#251)
- gfxsweep preset (#254) (#256)
- Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255)
- Adding a wallclock consistency detection preset (#258)
- Adding smoketest preset for simple correctness tests (#266)
- Help / envvars / presets presets (#267)
- Modernize CMake build (#268)
- Replace version-based pod/amd-smi detection with compile-time API probes (#269)
- Fix collective mismatch hangs in multi-rank error paths (#270)
- Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271)
- Reformat a2asweep output to match gfxsweep style (#272)
- Gfx sweep update (#274)
- Increasing flush frequency in smoketest (#275)
- Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
- Fixes for cuMem compilation and invalid device ordinal (#278)
- Simplifying socket connect, allow for using host address (#279)
- Updating podring to run on single node without need to force single pod (#280)
- Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)

---------

Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com>
Co-authored-by: Pak Nin Lui <pak.lui@amd.com>
Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants